biqqles / dataclassy

A fast and flexible reimplementation of data classes
https://pypi.org/project/dataclassy
Mozilla Public License 2.0
82 stars 9 forks source link

Error with removed init parameters #6

Closed TylerYep closed 4 years ago

TylerYep commented 4 years ago

Hi again! I'm getting the following error (simplified for your convenience):

@dataclass
class State:
    path: Tuple[int, ...] = 1

    def __init__(self):
        pass

raises_error = State(7)

The error message:

cls = <class 'tests.State'>, args = (7,), kwargs = {}, instance = State(path=1)

    def __call__(cls, *args, **kwargs):
        instance = cls.__new__(cls, *args, **kwargs)

        args = args[cls.__new__.__code__.co_argcount - 1:]  # -1 for 'cls'
        for parameter in kwargs.keys() & cls.__annotations__.keys():
            del kwargs[parameter]

>       instance.__init__(*args, **kwargs)
E       TypeError: __init__() takes 1 positional argument but 2 were given

dataclassy/dataclass.py:111: TypeError

The code works fine when I add the keyword State(path=7) or remove the default or remove the __init__(self) method.

biqqles commented 4 years ago

Thanks, I've fixed this but I'm not yet awake enough to be completely sure of exactly why it was broken before or commit a proper test!

biqqles commented 4 years ago

Ah, it's because the earlier slice didn't consider the extra *args parameter added in __new__ to allow argument pass-through to __init__.

Not directly related to the bug but now we use a custom __call__ it might be neater to use the keyword-only indicator * instead of *args.

biqqles commented 4 years ago

OK, I think that's everything :slightly_smiling_face: If you can confirm everything now works as expected I'll release an update to PyPI.

TylerYep commented 4 years ago

Awesome, I tested it and it works!

As a side note, it seems like dataclassy's performance is roughly twice that of the builtin dataclasses. On the project that encountered this bug (it creates many instances of a single dataclass) I was profiling, the runtime roughly doubles from 4 sec with dataclasses to 8 sec with dataclassy.

Most of the time spent is on dataclassy/dataclassy/dataclass.py:104(__call__) and <string>:1(__new__), tested on Python 3.9.

No rush of course to improve this performance, I just wanted to provide a reference point in case you find it useful.

biqqles commented 4 years ago

Interesting. I have never got round to benchmarking dataclasses versus this library. I was acutely aware that __call__ is very hot and probably far slower than object.__call__ (which I believe is implemented in C). __new__ is the real equivalent of dataclasses' __init__ so it would be interesting to compare this directly.

I haven't looked at how dataclasses passes arguments to __post_init__, but it's likely it does it statically with metaprogramming rather than dynamically like dataclassy's current __call__ implementation. To be honest, the main reason I haven't obsessed too much about that method is that I use post-init processing very rarely, and when there isn't a custom __init__ dataclassy just uses object.__call__. Now that the library is more or less feature-complete (as I see it so far) performance would be something good to focus on. If it would also be useful to you, any benchmarking you do would likely be very helpful in a new issue.

Anyway, good to know that it's fixed and also that it works with 3.9!