biqqles / dataclassy

A fast and flexible reimplementation of data classes
https://pypi.org/project/dataclassy
Mozilla Public License 2.0
81 stars 9 forks source link

Derived classes with @dataclass(slots=True) generate empty __slots__ #50

Closed thisisrandy closed 3 years ago

thisisrandy commented 3 years ago

Code speaks more precisely than words:

In [1]: from dataclassy import dataclass
   ...: 
   ...: @dataclass(slots=True)
   ...: class Base:
   ...:     foo: int
   ...: 
   ...: @dataclass(slots=True)
   ...: class Derived(Base):
   ...:     bar: int

In [2]: Base.__slots__
Out[2]: ('foo',)

In [3]: Derived.__slots__
Out[3]: ()

In [4]: Derived(1, 2)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-24-1d764d7cf19f> in <module>
----> 1 Derived(1, 2)

<string> in __init__(self, foo, bar)

AttributeError: 'Derived' object has no attribute 'bar'

My expectation, if it isn't obvious, is that Derived.__slots__ == ("bar",), just as would be true of any other python derived class adding additional slots to its parent.

The desired effect can be acheived by omitting the decorator on the derived class:

In [5]: class DerivedNoDecorator(Base):
   ...:     bar: int

In [6]: DerivedNoDecorator.__slots__
Out[6]: ('bar',)

In [7]: DerivedNoDecorator(1,2)
Out[7]: DerivedNoDecorator(foo=1, bar=2)

but this seems non-ideal. If I use the built-in dataclasses with __slots__, I can decorate the derived class without any issues, so it would be much clearer if dataclassy behaved the same way:

In [1]: from dataclasses import dataclass

In [2]: @dataclass
   ...: class Base:
   ...:     __slots__ = ("foo")
   ...:     foo: int

In [3]: @dataclass
   ...: class Derived(Base):
   ...:     __slots__ = ("bar")
   ...:     bar: int

In [4]: Derived.__slots__
Out[4]: 'bar'

In [5]: Derived(1, 2)
Out[5]: Derived(foo=1, bar=2)

Of course, I also want to use default values, so I would much prefer to use dataclassy. You'll probably not be surprised to learn that I came here via your SO answer.

I've tried to do some debugging myself, and it looks like for the Base/Derived example that I began with, we end up in DataClassMeta.__new__ three times, during the third of which we begin with dict_["__slots__"] == ("bar",) and bases == (Base,), Base thus providing foo to all_slots, so at line 110, we set dict_["__slots__"] = (). I literally learned what a metaclass is today in order to try to figure this out, so I might be off the mark, but I think the problem is that DataClassMeta.__new__ is invoked once by the dataclass decorator, and then once again going up the chain from Derived -> Base -> DataClassMeta (-> type), which sees the result of the first invocation and decides it doesn't need to add any new slots, only to have the result of the first invocation lost in the final result. Does that sound like a reasonable version of events?

This is a lot more nuts and bolts of python than I normally dig into, so I can't say I know what to do about it, but perhaps there's an obvious or at least feasible solution here that you can illuminate. Thanks in advance.

EDIT: I just finished the docs, and I realized that you tout the need to only apply the decorator to base classes as a feature. I think I'm inclined to agree, but it probably also explains why this issue wasn't considered during design. I'll probably go ahead and remove the decorator from subclasses in my project for now, but I still believe that this should be fixed to more closely mimic the dataclasses API.

biqqles commented 3 years ago

DataClassMeta.new is invoked once by the dataclass decorator, and then once again going up the chain from Derived -> Base -> DataClassMeta (-> type), which sees the result of the first invocation and decides it doesn't need to add any new slots, only to have the result of the first invocation lost in the final result. Does that sound like a reasonable version of events?

Yes, if I understand your explanation correctly, this is indeed what happens. Using a metaclass means that DataClassMeta.__new__ is run for each new subclass, then applying the decorator means the subclass is replaced with a new version (this happens here if you are interested). It can cause issues.

If you're wondering, the way my fix works is by exploiting the fact that the dict_ passed in looks different between these two invocations. In the first instance (subclassing), the metaclass is called before Python has done any of its inheritance magic, meaning __slots__ is empty. In the second instance (decorator usage), the class is already fully formed, meaning __slots__ has been set, so it's not replaced.

This change also means that a user-defined __slots__ will not be wiped out which I think is good anyway.

thisisrandy commented 3 years ago

Thanks for the detailed explanation and quick fix. I'm not sure I had it 100% in my original explanation, but it's clear what was going on now.