Closed chadrik closed 6 years ago
First off, I found and read #19, which is a good read for anyone wondering whether attrs
should be added to the stdlib (spoiler: it should not).
Here is my first attempt at an overview of the differences, starting with function arguments:
attr.attr |
dataclasses.field |
---|---|
default |
default or default_factory |
validator |
not present |
repr |
repr |
cmp |
cmp |
hash |
hash |
init |
init |
convert |
not present |
metadata |
not present |
type |
not applicable (uses annotations) |
attr.attributes |
dataclasses.dataclass |
---|---|
these |
not present |
repr_ns |
not applicable in python 3.x |
repr |
repr |
cmp |
compare , and/or eq |
hash |
hash |
init |
init |
slots |
not present |
frozen |
frozen |
str |
not present |
Notes / Observations:
metadata
and validator
from dataclasses.field
are concerning for me. these are pretty crucial to my use of attrs
. I could see an argument for convert
and validator
being merged into a single entity, but I definitely would not want to see them both missing__slots__ = ('x', 'y', 'z')
to their class"cmp
vs compare
/eq
was covered in #48: compare=False, eq=True
generates just __eq__
and __ne__
and is used for for "unorderable types". I'm still a little hazy on why this is necessary.default_factory
vs default
was covered in #24. dataclasses
splits default_factory
from default
so that an arbitrary callable can be provided as a data factory, whereas attrs
requires factories to be a attr.Factory
instance. attrs
with https://github.com/python-attrs/attrs/issues/262 via auto_attribs=True
, which removes one of the remaining differencesattrs
has almost the superset of functionality, which gives me hope that a compatibility layer could be provided.
dataclasses
feature missing from attrs
is eq
(covered above).If anyone is aware of deeper functional differences, I'd love to hear them. Thanks!
edit1: added notes on eq
edit2: clarified default_factory
difference
I think this is a useful exercise, thanks. I agree that it would be a shame to inadvertently miss something that's in attrs
, especially if that locks us in to an API that we regret. I'll spend some time reviewing your table one-by-one, and comment as I go.
As far as conversion functions and validators, I'd like to not support these. I'm hoping that static type checking gets us most of the way there.
default
/ default_factory
is mostly covered in issue #24. default
is used to specify a default value, and default_factory
is used to specify a callable that generates a default value. They need to be separate, because otherwise you'd have to do something like initial_value = default() if callable(default) else default
, which precludes you from having a default value which is itself a callable. It's an error to specify both default
and default_factory
.
default / default_factory is mostly covered in issue #24.
Thanks, that conversation cleared it up for me. I updated my post above with the new info.
As far as conversion functions and validators, I'd like to not support these. I'm hoping that static type checking gets us most of the way there.
I don't think that static type checking has much impact on the need for converters. Take something like this for instance:
@attr.s
class C:
x: int = attr.ib(default=0, converter=int)
y: int = attr.ib(default=0, converter=int)
c = C('1', 1.1)
This pattern is very common. A hypothetical mypy plugin for attrs
or dataclasses
could make C('1', 1.1)
valid by using the converter's argument type for __init__
if present.
Without converters the best we can do this:
@dataclass
class C:
x: int = 0
y: int = 0
c = C(int('1'), int(1.1))
Static type checking doesn't really have much to offer here in terms of ease of use: the best it can do is nag us to cast everything to int
. That does not alleviate the inconvenience of having to do that throughout your codebase, whereas a converter defined on the field does. Moreover, conversions cannot be accomplished post-init, because the converter's type needs to be understood by the static type-check plugin. Bottom line: converters are a convenience without a valid workaround, and their absence will be frustrating to users.
As for validators, static type checking gets us part of the way there, but certainly not most of the way there. Here are some example validations:
x in y
x in range(y, z)
re.match(y, x)
len(x) < y
instance(x, Y)
All of these require runtime validation except the last. That said, validation can be performed in post_init
, so unlike converters, at least there is a workaround.
Is there an argument against adding metadata
? It's hard to overstate how important this one is. It's a catchall for anything and everything that dataclasses
cannot or should not have first class support for. In other words, it is the foundations for third-party utilities built up around dataclasses
, for things such as UI presentation, database ORMs, serialization, and yes, even validation.
I think the fact that static type checkers prohibit something like:
class C:
x: int = ...
y: int = ...
c = C('1', 1.1)
is rather good, not bad. What are the use cases for converters (apart form being temporary workarounds themselves)? As for validators, they can be added to __dataclass_post_init__
(I hope we will find a better name). Moreover, the latter can perform cross-field validation, so I agree with @ericvsmith here, we probably don't need validators and converters.
As for metadata
, I don't have a strong opinion, but could imagine that it is indeed useful.
metadata
has been added.
@chadrik: where do you propose this documentation should go? Or is this just an exercise for the design phase, which I think has ended. It's not appropriate for this to go in the stdlib documentation.
I think that attrs users are most definitely going to want this information once this project makes it into the stdlib. How about adding it to the wiki for now? I’ll gladly keep it up to date. I also want to use it to lobby for certain changes to attrs to increase compatibility (e.g. order vs cmp behavior). On Fri, Dec 1, 2017 at 10:44 AM Eric V. Smith notifications@github.com wrote:
metadata has been added.
@chadrik https://github.com/chadrik: where do you propose this documentation should go? Or is this just an exercise for the design phase, which I think has ended. It's not appropriate for this to go in the stdlib documentation.
— You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/ericvsmith/dataclasses/issues/60#issuecomment-348528283, or mute the thread https://github.com/notifications/unsubscribe-auth/AAD3E6j_nnE9uTo_770gRS0E_P-o0CTgks5s8B7rgaJpZM4QO1z0 .
Either the Wiki (which I have no access to) or maybe under attrs
' documentation (ditto).
And note that you can use dataclasses
today, from PyPI, on 3.6. So let the lobbying begin, once the PEP is accepted.
Also, note that attrs
' these
parameter is roughly equivalent to the dataclasses.make_dataclass()
function. So I think the only real difference in your table is __slots__
, validate
, and convert
. I deliberately don't want to support validation and conversion, instead leaving that to static type checkers (see https://github.com/ericvsmith/dataclasses/issues/60#issuecomment-342009693 above).
As for __slots__
, that's a deliberate decision. Although I have another decorator which I'm not including in the PEP that adds __slots__
and returns a new class. See add_slots()
in dataclass_tools.py
in this repo. Because it's the only parameter that causes dataclass()
to return a new class, I thought it was best to leave it out, at least for now. I'd like to make sure dataclass()
is seen as something that just adds methods to a class, not returns a new class. Maybe that will change over time.
I think that the "return a new class" approach is fundamentally incompatible with metaclasses and especially PEP 487. Since there is no way to add slots to an existing class, I'm considering a different API for slot classes in attrs too. Or, you know, Python could grow a better __slots__
interface itself, but I'm not holding my breath.
Actually we should design a new slots interface. The original was designed before we had class decorators.
On Dec 2, 2017 12:17 PM, "Tin Tvrtković" notifications@github.com wrote:
I think that the "return a new class" approach is fundamentally incompatible with metaclasses and especially PEP 487. Since there is no way to add slots to an existing class, I'm considering a different API for slot classes in attrs too. Or, you know, Python could grow a better slots interface itself, but I'm not holding my breath.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ericvsmith/dataclasses/issues/60#issuecomment-348717142, or mute the thread https://github.com/notifications/unsubscribe-auth/ACwrMvziqyw8W_mLHtcHaBa-2yW1IKRLks5s8bA6gaJpZM4QO1z0 .
Actually we should design a new slots interface. The original was designed before we had class decorators.
Yes please!
That won't be easy though -- it means that the instance layout has to be made changeable after the class object has been created (which happens when the metaclass creates it -- before the class decorator runs). Mayby there are some folks on python-ideas interested in brainstorming on how to do this.
One last effort on this topic:
I think the fact that static type checkers prohibit something like:
class C: x: int = ... y: int = ... c = C('1', 1.1)
is rather good, not bad.
What if say, over half of the uses of C
required converting a variable to int
, and what if that conversion was not as simple as calling a builtin but also required an import from some other module? This doesn't seem like a question of correctness to me, but rather one of convenience. Very many classes in the real world perform some conversion of arguments within their __init__
methods, and unlike validators I don't see a good alternative for those who don't want to perform conversions all over their code instead of in one place. There's the possibility of casting and re-binding the attributes in __post_init__
, but that would break static type-checking: for that to work the mypy plugin needs to integrate converter annotations into the __init__
annotations, which means dataclasses
needs first class support for converters.
@chadrik
What if say, over half of the uses of
C
required converting a variable toint
I think such situations are relatively rare (like legacy API or similar). And IIUC this use case is covered by a combination of InitVar
and __post_init__
:
@dataclass
class C:
a: str
b: str = field(init=False)
_b: InitVar[bytes]
def __post_init__(self, _b) -> None:
self.b = convert_from_legacy_api(_b)
aa: str = 'a test'
bb: bytes = b'b test'
c = C(aa, bb) # OK
And this will work well with static type checkers.
(I think you started with a/b/_b and then continued with x/y/_y?)
Indeed :-) Fixed!
I think there's nothing else to add here. Closing this issue.
I honestly don't see how the dummy InitVar + extra var + __post_init__ is a Pythonic replacement to the simple and clean converter. And it's also, as far as I can tell, not a solution for frozen dataclasses.
Take this very simple and common dataclass
@dataclasses.dataclass(frozen=True)`
class Group:
names: Sequence[str]
How do you insure names is not mutable itself? Normally, a simpler converter=tuple
would do the job, but now, you have to do all sorts of hacks and object.__setattr__
and so on. None of it is pythonic, clean or user-friendly.
It’s unpythonic to expect “deep” frozen-ness. A frozen object disallows attribute assignment but doesn’t care about modifying attribute values.
It would be helpful to have a list of functional differences between
dataclasses
andattrs
, broken down by@dataclass
vs@attr.s
andfield
vsattr.ib
.This would be useful and illuminating for a few reasons:
It would make it easier to vet the logic behind, and need for, each of the proposed differences.
@hynek and @Tinche have invested years of thought into the current design: deviating from it without fully understanding the history and reasoning behind each decision might lead to this project needlessly repeating mistakes. I'm glad to see that the
attrs
devs have already been brought into several issues. My hope is we can get a bird's eye view so that nothing slips through the cracks.If the differences aren't too great (and ideally they will not be, see above) I'd like to see a
dataclass
compatibility mode forattrs
(e.g.from attrs import dataclass, field
).I'm glad that this badly-needed feature is being worked on, but sadly I'm stuck in python 2 for at least another 2 years, so it's important to me, and surely many
attrs
-users, to have an easy path to adoption once this becomes part of stdlib.