Open denisrosset opened 2 years ago
W00t, Waterloo! </ahem>
My understanding is that dataclass support works by decorating the
__init__
method.
I see your understanding is as large as mine.
However, dataclasses are not checked when they are invalid and passed as arguments to other functions. Nor when
beartype.abby.die_if_unbearable
is called.
Indeed, the ugly truth is now exposed for all too see. Like @beartype
, the @dataclass.dataclass
decorator is really just an obscene pile of runtime trickery. It's unclear whether @beartype can (or even should) automatically insinuate itself into that trickery by pretending third-party dataclasses not originally decorated by @beartype
were decorated by @beartype
after all.
Thankfully, this is Python. We can do whatever we like regardless of what anyone else thinks, because that's what the world's second-slowest language buys us. Options present themselves like ripe sushi for the plucking:
def test_unbearable_dataclass() -> None: # fails
with pytest.raises(beartype.roar.BeartypeException):
data = WithoutDecorator(-1)
WithDecoratorAfterTheFact = beartype.beartype(WithoutDecorator)
beartype.abby.die_if_unbearable(data, WithDecoratorAfterTheFact)
I couldn't help myself and tested an equivalent snippet from within an IPython REPL. I proudly confirm worky:
>>> from dataclasses import dataclass
>>> import beartype
>>> import beartype.abby
>>> import beartype.vale
>>> from typing_extensions import Annotated
>>> @dataclass(frozen=True)
... class WithoutDecorator:
... a: Annotated[int, beartype.vale.Is[lambda x: x >= 0]]
>>> WithDecoratorAfterTheFact = beartype.beartype(WithoutDecorator)
>>> data = WithoutDecorator(-1)
>>> beartype.abby.die_if_unbearable(data, WithDecoratorAfterTheFact)
beartype.roar.BeartypeCallHintParamViolation: @beartyped WithoutDecorator.__init__() parameter a=-1 violates type hint typing.Annotated[int, Is[lambda x: x >= 0]], as -1 violates validator Is[lambda x: x >= 0]:
False == Is[lambda x: x >= 0].
Silently decorating dataclasses not decorated by @beartype with @beartype reduces annoyance, but comes at the cost of probably dramatically harming time performance if implemented naïvely. Why? Because the naïve implementation would dynamically declare one new @beartype-decorated dataclass for each @beartype call passed an undecorated dataclass. Obviously, that's awful.
If we go down this road, we should go down this road crazy fast without regard for local traffic laws – otherwise, what's the point, right? This. Is. @beartype.
So, the non-naïve implementation requires @beartype to internally cache and reuse @beartype-decorated dataclasses when passed undecorated dataclasses. Obviously, that trades off time for speed – because @beartype would now be effectively doubling the number of dataclasses declared per Python process.
Maybe nobody cares? I have no data points here. If somebody cares, LRU caching shyly raises its hand at the back of the class. And then there's always...
beartype.abby.die_if_unbearable(data.a, hints["a"])
test, in which @beartype would iteratively check all dataclass fields of undecorated dataclasses. That probably sounds great on paper, except that:
Option #1 for short-term gain. Option #2 to minimize long-term pain – hopefully with caching. High fives all around! :hand:
__init__
-methods@beartype
ontop of @dataclass
wrap
-function gets "expanded" to beartyp the dataclasses init-method if it is_one_of_my_own
to calculate the is_one_of_my_own
-bool might be better to use some trie/caching?
--- original.py 2023-03-17 07:17:55.333675520 +0100
+++ patched.py 2023-03-17 07:17:56.989660276 +0100
@@ -1,4 +1,4 @@
-def dataclass(
+def beartyped_dataclass(
cls=None,
/,
*,
@@ -22,7 +22,17 @@
"""
def wrap(cls):
return data_cls
if cls is None:
dataclasses.dataclass = beartyped_dataclass
### 2. define which dataclasses to beartype
* `BEARTYPED_DATACLASS_PREFIXES`: a collection of path/directory prefixes that defines which dataclasses are to be beartyped
```python
BEARTYPED_DATACLASS_PREFIXES: set[str] = set()
def beartype_all_dataclasses_of_this_files_parent(file: str):
package_dir = str(pathlib.Path(file).parent.resolve())
already_contained = any(
(package_dir.startswith(s) for s in BEARTYPED_DATACLASS_PREFIXES)
)
if not already_contained:
BEARTYPED_DATACLASS_PREFIXES.add(package_dir)
# maybe remove children/prefixes (subdirectories)
children = [
p
for p in BEARTYPED_DATACLASS_PREFIXES
if p.startswith(package_dir) and p != package_dir
]
is_parent = len(children) > 0
if is_parent:
for ch in children:
BEARTYPED_DATACLASS_PREFIXES.remove(ch)
BEARTYPED_DATACLASS_PREFIXES.add(package_dir)
__init__.py
from misc_utils.beartyped_dataclass_patch import beartype_all_dataclasses_of_this_files_parent
beartype_all_dataclasses_of_this_files_parent(file)
I am super grateful for any feedback!
@dertilo!! Everybody, it's our newest favourite GitHub Sponsor!! @beartype :heart_eyes_cat: you so much.
</ahem>
...where was I? Where am I? Who am I? Oh. Right. GitHub. Let's do this on a Saturday night.
my IDE (pycharm) that was/is unwilling to "handle" the
@beartype
ontop of@dataclass
Curse you a hundred-fold, JetBrains. Curse you.
My wife and I otherwise love PyCharm, but JetBrains made the curious decision to implement their own ad-hoc Python static type-checker inside PyCharm. Every other IDE I've used just reuses standard off-the-shelf static type-checkers, because that is the easy (and therefore profitable) thing to do.
For example, let us consider the most popular Python IDE. I speak of Microsoft's VSCode, which supports Python via Pylance, which:
pyright
static type-checker.@beartype officially supports both pyright
and mypy and thus just works "out-of-the-box" with VSCode. Personally, I prefer command-line Vim, because I live in a cabin in the woods. People like me like insanity like Vim.
Sadly, it's probably impossible for @beartype to support PyCharm's non-standard (and probably closed-source) static type-checker. It hurts! We really want to support that static type-checker – whatever it is. It doesn't even seem to have a name, which is equally bizarre. Sad cat is cat. :crying_cat_face:
how bad/dangerous/suboptimal is this hack?
You are monkey-patching the Python standard library in compelling and explosive ways. The answer, of course, is... this is so awesome! Here at @beartype Studios, we strongly support risky, reckless, and lawless bold, brave, and innovative behaviour like this.
You're in luck, too. Line-by-line inspection of your helpful diff suggests that you've done everything absolutely right! Well, almost everything. You violate privacy encapsulation by directly calling _process_class()
, which isn't great. Although Python 3.11 still has that private function, it now accepts way more parameters. Here's the same dataclasses
logic in Python 3.11:
# Here's the Python 3.11 variant of `wrap()`:
def wrap(cls):
return _process_class(cls, init, repr, eq, order, unsafe_hash,
frozen, match_args, kw_only, slots,
weakref_slot)
# Here's the Python 3.9 variant of `wrap()`:
def wrap(cls):
return _process_class(cls, init, repr, eq, order, unsafe_hash, frozen)
Super-different, right? This means that your current approach, while extraordinarily clever, is also non-portable. Which isn't great.
An alternative approach would be to black-box this without violating privacy encapsulation. I'm thinking of code that is untested but is guaranteed to do something that might even work:
from beartype import beartype
from dataclasses import dataclass as dataclass_vanilla
def dataclass_beartype(cls=None, /, **kwargs):
def wrap_beartype(cls):
data_cls = dataclass_vanilla(cls, **kwargs)
this_dataclasses_file = sys.modules[cls.__module__].__file__
return (
beartype(data_cls)
if any(
this_dataclasses_file.startswith(s)
for s in BEARTYPED_DATACLASS_PREFIXES
) else
data_cls
)
# See if we're being called as @dataclass or @dataclass().
if cls is None:
# We're called with parens.
return wrap_beartype
# We're called as @dataclass without parens.
return wrap_beartype(cls)
dataclasses.dataclass = dataclass_beartype
Completely untested, but possibly works. Privacy is now preserved. Let's choose to believe my empty, hollow promises.
Everything else looks sane. Well, beartype_all_dataclasses_of_this_files_parent()
is pretty intense; you've sorta implemented a crude alternative to import hooks. That's not necessarily bad, of course. Your approach clearly works. Clearly working is really all that matters.
Tremendous work, bro! And you'll be delighted to learn that @beartype will implement import hooks in some future version to be released by 2057 (i.e., the year the Big Asteroid finally hits Earth). Once released, you won't need to do any of the above monkey-patching anymore. In fact, you won't need to reference @beartype
anywhere at all in your codebase anymore except for a single statement at the top of your top-level {your_package}.__init__
module resembling:
# Import the requisite machinery
from beartype.claw import beartype_all
# Automatically @beartype everything, everywhere, for all time.
beartype_all()
I've even already implemented like 95% of the beartype.claw
subpackage. Then I got tired, went to bed, and stumbled off to do other stuff. I never did finish beartype.claw
... but I will someday! I swear this, @dertilo. And Berlin will never be the same again.
My understanding is that dataclass support works by decorating the
__init__
method. However, dataclasses are not checked when they are invalid and passed as arguments to other functions. Nor whenbeartype.abby.die_if_unbearable
is called.I wrote a small test suite, shown below.
What would be needed to make this support happen?