beartype / beartype

Unbearably fast near-real-time hybrid runtime-static type-checking in pure Python.
https://beartype.readthedocs.io
MIT License
2.6k stars 55 forks source link

[Feature Request] Detect unbeartyped @dataclasses and do something about it #119

Open denisrosset opened 2 years ago

denisrosset commented 2 years ago

My understanding is that dataclass support works by decorating the __init__ method. However, dataclasses are not checked when they are invalid and passed as arguments to other functions. Nor when beartype.abby.die_if_unbearable is called.

I wrote a small test suite, shown below.

What would be needed to make this support happen?

from dataclasses import dataclass

import beartype
import beartype.abby
import beartype.roar
import beartype.vale
import pytest
from typing_extensions import Annotated, get_type_hints

@beartype.beartype
@dataclass(frozen=True)
class WithDecorator:
    a: Annotated[int, beartype.vale.Is[lambda x: x >= 0]]

@dataclass(frozen=True)
class WithoutDecorator:
    a: Annotated[int, beartype.vale.Is[lambda x: x >= 0]]

def test_beartype_decorator() -> None:
    with pytest.raises(beartype.roar.BeartypeException):
        WithDecorator(-1)

def test_unbearable_dataclass() -> None:  # fails
    with pytest.raises(beartype.roar.BeartypeException):
        data = WithoutDecorator(-1)
        beartype.abby.die_if_unbearable(data, WithoutDecorator)

def test_dataclass_in_argument() -> None:  # fails
    with pytest.raises(beartype.roar.BeartypeException):

        @beartype.beartype
        def fun(d: WithoutDecorator) -> None:
            pass

        data = WithoutDecorator(-1)
        fun(data)

def test_unbearable_field() -> None:
    hints = get_type_hints(WithoutDecorator, include_extras=True)
    with pytest.raises(beartype.roar.BeartypeException):
        data = WithoutDecorator(-1)
        beartype.abby.die_if_unbearable(data.a, hints["a"])
leycec commented 2 years ago

W00t, Waterloo! </ahem>

My understanding is that dataclass support works by decorating the __init__ method.

I see your understanding is as large as mine.

However, dataclasses are not checked when they are invalid and passed as arguments to other functions. Nor when beartype.abby.die_if_unbearable is called.

Indeed, the ugly truth is now exposed for all too see. Like @beartype, the @dataclass.dataclass decorator is really just an obscene pile of runtime trickery. It's unclear whether @beartype can (or even should) automatically insinuate itself into that trickery by pretending third-party dataclasses not originally decorated by @beartype were decorated by @beartype after all.

Thankfully, this is Python. We can do whatever we like regardless of what anyone else thinks, because that's what the world's second-slowest language buys us. Options present themselves like ripe sushi for the plucking:

  1. Just do it manually. Python decorators are actually just standard callables, which means you can call them just as you would any other function. Pretty sure this works, so let's pretend I actually tested this:
def test_unbearable_dataclass() -> None:  # fails
    with pytest.raises(beartype.roar.BeartypeException):
        data = WithoutDecorator(-1)
        WithDecoratorAfterTheFact = beartype.beartype(WithoutDecorator)
        beartype.abby.die_if_unbearable(data, WithDecoratorAfterTheFact)

I couldn't help myself and tested an equivalent snippet from within an IPython REPL. I proudly confirm worky:

>>> from dataclasses import dataclass
>>> import beartype
>>> import beartype.abby
>>> import beartype.vale
>>> from typing_extensions import Annotated
>>> @dataclass(frozen=True)
... class WithoutDecorator:
...     a: Annotated[int, beartype.vale.Is[lambda x: x >= 0]]
>>> WithDecoratorAfterTheFact = beartype.beartype(WithoutDecorator)
>>> data = WithoutDecorator(-1)
>>> beartype.abby.die_if_unbearable(data, WithDecoratorAfterTheFact)
beartype.roar.BeartypeCallHintParamViolation: @beartyped WithoutDecorator.__init__() parameter a=-1 violates type hint typing.Annotated[int, Is[lambda x: x >= 0]], as -1 violates validator Is[lambda x: x >= 0]:
    False == Is[lambda x: x >= 0].

roar, bro

  1. Shift the burden of proof onto @beartype. Arguably, @beartype itself should either:
    • Emit a non-fatal warning when passed a dataclass not decorated by @beartype. That could get kinda annoying, though. So...
    • Silently decorate dataclasses not decorated by @beartype with @beartype. That gets non-trivial fast, though. So, let's dig deeper into this special madness.

Automate It, Bro

Silently decorating dataclasses not decorated by @beartype with @beartype reduces annoyance, but comes at the cost of probably dramatically harming time performance if implemented naïvely. Why? Because the naïve implementation would dynamically declare one new @beartype-decorated dataclass for each @beartype call passed an undecorated dataclass. Obviously, that's awful.

If we go down this road, we should go down this road crazy fast without regard for local traffic laws – otherwise, what's the point, right? This. Is. @beartype.

So, the non-naïve implementation requires @beartype to internally cache and reuse @beartype-decorated dataclasses when passed undecorated dataclasses. Obviously, that trades off time for speed – because @beartype would now be effectively doubling the number of dataclasses declared per Python process.

Maybe nobody cares? I have no data points here. If somebody cares, LRU caching shyly raises its hand at the back of the class. And then there's always...

  1. Shift the burden of proof onto @beartype but in a crazy way. Hacky examples include your concluding beartype.abby.die_if_unbearable(data.a, hints["a"]) test, in which @beartype would iteratively check all dataclass fields of undecorated dataclasses. That probably sounds great on paper, except that:
    • It requires substantially more heavy lifting from @beartype – heavy lifting we don't necessarily have time to implement. Time, what is time!?!?
    • It possibly imposes runtime inefficiencies I shudder to even fathom in my darkest dreams.

tl;dr

Option #1 for short-term gain. Option #2 to minimize long-term pain – hopefully with caching. High fives all around! :hand:

dertilo commented 1 year ago

Hi! I am using the following hack to beartype all "my" dataclasses' __init__ -methods

1. hack-patch python3.9/dataclasses.py

dataclasses.dataclass = beartyped_dataclass


### 2. define which dataclasses to beartype
* `BEARTYPED_DATACLASS_PREFIXES`: a collection of path/directory prefixes that defines which dataclasses are to be beartyped

```python
BEARTYPED_DATACLASS_PREFIXES: set[str] = set()

def beartype_all_dataclasses_of_this_files_parent(file: str):
    package_dir = str(pathlib.Path(file).parent.resolve())

    already_contained = any(
        (package_dir.startswith(s) for s in BEARTYPED_DATACLASS_PREFIXES)
    )
    if not already_contained:
        BEARTYPED_DATACLASS_PREFIXES.add(package_dir)

    # maybe remove children/prefixes (subdirectories)
    children = [
        p
        for p in BEARTYPED_DATACLASS_PREFIXES
        if p.startswith(package_dir) and p != package_dir
    ]
    is_parent = len(children) > 0
    if is_parent:
        for ch in children:
            BEARTYPED_DATACLASS_PREFIXES.remove(ch)
        BEARTYPED_DATACLASS_PREFIXES.add(package_dir)

beartype_all_dataclasses_of_this_files_parent(file)


I am super grateful for any feedback!
leycec commented 1 year ago

@dertilo!! Everybody, it's our newest favourite GitHub Sponsor!! @beartype :heart_eyes_cat: you so much.

</ahem>

...where was I? Where am I? Who am I? Oh. Right. GitHub. Let's do this on a Saturday night.

my IDE (pycharm) that was/is unwilling to "handle" the @beartype ontop of @dataclass

Curse you a hundred-fold, JetBrains. Curse you.

My wife and I otherwise love PyCharm, but JetBrains made the curious decision to implement their own ad-hoc Python static type-checker inside PyCharm. Every other IDE I've used just reuses standard off-the-shelf static type-checkers, because that is the easy (and therefore profitable) thing to do.

For example, let us consider the most popular Python IDE. I speak of Microsoft's VSCode, which supports Python via Pylance, which:

@beartype officially supports both pyright and mypy and thus just works "out-of-the-box" with VSCode. Personally, I prefer command-line Vim, because I live in a cabin in the woods. People like me like insanity like Vim.

Sadly, it's probably impossible for @beartype to support PyCharm's non-standard (and probably closed-source) static type-checker. It hurts! We really want to support that static type-checker – whatever it is. It doesn't even seem to have a name, which is equally bizarre. Sad cat is cat. :crying_cat_face:

how bad/dangerous/suboptimal is this hack?

You are monkey-patching the Python standard library in compelling and explosive ways. The answer, of course, is... this is so awesome! Here at @beartype Studios, we strongly support risky, reckless, and lawless bold, brave, and innovative behaviour like this.

You're in luck, too. Line-by-line inspection of your helpful diff suggests that you've done everything absolutely right! Well, almost everything. You violate privacy encapsulation by directly calling _process_class(), which isn't great. Although Python 3.11 still has that private function, it now accepts way more parameters. Here's the same dataclasses logic in Python 3.11:

# Here's the Python 3.11 variant of `wrap()`:
    def wrap(cls):
        return _process_class(cls, init, repr, eq, order, unsafe_hash,
                              frozen, match_args, kw_only, slots,
                              weakref_slot)

# Here's the Python 3.9 variant of `wrap()`:
     def wrap(cls):
         return _process_class(cls, init, repr, eq, order, unsafe_hash, frozen)

Super-different, right? This means that your current approach, while extraordinarily clever, is also non-portable. Which isn't great.

An alternative approach would be to black-box this without violating privacy encapsulation. I'm thinking of code that is untested but is guaranteed to do something that might even work:

from beartype import beartype
from dataclasses import dataclass as dataclass_vanilla

def dataclass_beartype(cls=None, /, **kwargs):

    def wrap_beartype(cls):
        data_cls = dataclass_vanilla(cls, **kwargs)
        this_dataclasses_file = sys.modules[cls.__module__].__file__

        return (
            beartype(data_cls)
            if any(
                this_dataclasses_file.startswith(s)
                for s in BEARTYPED_DATACLASS_PREFIXES
            ) else
            data_cls
        )

    # See if we're being called as @dataclass or @dataclass().
    if cls is None:
        # We're called with parens.
        return wrap_beartype

    # We're called as @dataclass without parens.
    return wrap_beartype(cls)

dataclasses.dataclass = dataclass_beartype

Completely untested, but possibly works. Privacy is now preserved. Let's choose to believe my empty, hollow promises.

Everything else looks sane. Well, beartype_all_dataclasses_of_this_files_parent() is pretty intense; you've sorta implemented a crude alternative to import hooks. That's not necessarily bad, of course. Your approach clearly works. Clearly working is really all that matters.

Tremendous work, bro! And you'll be delighted to learn that @beartype will implement import hooks in some future version to be released by 2057 (i.e., the year the Big Asteroid finally hits Earth). Once released, you won't need to do any of the above monkey-patching anymore. In fact, you won't need to reference @beartype anywhere at all in your codebase anymore except for a single statement at the top of your top-level {your_package}.__init__ module resembling:

# Import the requisite machinery
from beartype.claw import beartype_all

# Automatically @beartype everything, everywhere, for all time.
beartype_all()

I've even already implemented like 95% of the beartype.claw subpackage. Then I got tired, went to bed, and stumbled off to do other stuff. I never did finish beartype.claw... but I will someday! I swear this, @dertilo. And Berlin will never be the same again.