differences / compatibility with attrs project

chadrik commented 7 years ago

It would be helpful to have a list of functional differences between dataclasses and attrs, broken down by @dataclass vs @attr.s and field vs attr.ib.

This would be useful and illuminating for a few reasons:

It would make it easier to vet the logic behind, and need for, each of the proposed differences.

@hynek and @Tinche have invested years of thought into the current design: deviating from it without fully understanding the history and reasoning behind each decision might lead to this project needlessly repeating mistakes. I'm glad to see that the attrs devs have already been brought into several issues. My hope is we can get a bird's eye view so that nothing slips through the cracks.

If the differences aren't too great (and ideally they will not be, see above) I'd like to see a dataclass compatibility mode for attrs (e.g. from attrs import dataclass, field).

I'm glad that this badly-needed feature is being worked on, but sadly I'm stuck in python 2 for at least another 2 years, so it's important to me, and surely many attrs-users, to have an easy path to adoption once this becomes part of stdlib.

chadrik commented 7 years ago

First off, I found and read #19, which is a good read for anyone wondering whether attrs should be added to the stdlib (spoiler: it should not).

Here is my first attempt at an overview of the differences, starting with function arguments:

`attr.attr`	`dataclasses.field`
`default`	`default` or `default_factory`
`validator`	not present
`repr`	`repr`
`cmp`	`cmp`
`hash`	`hash`
`init`	`init`
`convert`	not present
`metadata`	not present
`type`	not applicable (uses annotations)

`attr.attributes`	`dataclasses.dataclass`
`these`	not present
`repr_ns`	not applicable in python 3.x
`repr`	`repr`
`cmp`	`compare`, and/or `eq`
`hash`	`hash`
`init`	`init`
`slots`	not present
`frozen`	`frozen`
`str`	not present

Notes / Observations:

the absence of metadata and validator from dataclasses.field are concerning for me. these are pretty crucial to my use of attrs. I could see an argument for convert and validator being merged into a single entity, but I definitely would not want to see them both missing
slots were covered in #28, and the consensus was "punt this down the road. If people want slots they can manually add __slots__ = ('x', 'y', 'z') to their class"
cmp vs compare/eq was covered in #48: compare=False, eq=True generates just __eq__ and __ne__ and is used for for "unorderable types". I'm still a little hazy on why this is necessary.
default_factory vs default was covered in #24. dataclasses splits default_factory from default so that an arbitrary callable can be provided as a data factory, whereas attrs requires factories to be a attr.Factory instance.
gathering fields from annotations will soon be supported in attrs with https://github.com/python-attrs/attrs/issues/262 via auto_attribs=True, which removes one of the remaining differences
at a surface level, attrs has almost the superset of functionality, which gives me hope that a compatibility layer could be provided.
- the only dataclasses feature missing from attrs is eq (covered above).

If anyone is aware of deeper functional differences, I'd love to hear them. Thanks!

edit1: added notes on eq edit2: clarified default_factory difference

ericvsmith commented 7 years ago

I think this is a useful exercise, thanks. I agree that it would be a shame to inadvertently miss something that's in attrs, especially if that locks us in to an API that we regret. I'll spend some time reviewing your table one-by-one, and comment as I go.

ericvsmith commented 7 years ago

As far as conversion functions and validators, I'd like to not support these. I'm hoping that static type checking gets us most of the way there.

ericvsmith commented 7 years ago

default / default_factory is mostly covered in issue #24. default is used to specify a default value, and default_factory is used to specify a callable that generates a default value. They need to be separate, because otherwise you'd have to do something like initial_value = default() if callable(default) else default, which precludes you from having a default value which is itself a callable. It's an error to specify both default and default_factory.

chadrik commented 7 years ago

default / default_factory is mostly covered in issue #24.

Thanks, that conversation cleared it up for me. I updated my post above with the new info.

As far as conversion functions and validators, I'd like to not support these. I'm hoping that static type checking gets us most of the way there.

I don't think that static type checking has much impact on the need for converters. Take something like this for instance:

@attr.s
class C:
    x: int = attr.ib(default=0, converter=int)
    y: int = attr.ib(default=0, converter=int)

c = C('1', 1.1)

This pattern is very common. A hypothetical mypy plugin for attrs or dataclasses could make C('1', 1.1) valid by using the converter's argument type for __init__ if present.

Without converters the best we can do this:

@dataclass
class C:
    x: int = 0
    y: int = 0

c = C(int('1'), int(1.1))

Static type checking doesn't really have much to offer here in terms of ease of use: the best it can do is nag us to cast everything to int. That does not alleviate the inconvenience of having to do that throughout your codebase, whereas a converter defined on the field does. Moreover, conversions cannot be accomplished post-init, because the converter's type needs to be understood by the static type-check plugin. Bottom line: converters are a convenience without a valid workaround, and their absence will be frustrating to users.

As for validators, static type checking gets us part of the way there, but certainly not most of the way there. Here are some example validations:

x in y
x in range(y, z)
re.match(y, x)
len(x) < y
instance(x, Y)

All of these require runtime validation except the last. That said, validation can be performed in post_init, so unlike converters, at least there is a workaround.

Is there an argument against adding metadata? It's hard to overstate how important this one is. It's a catchall for anything and everything that dataclasses cannot or should not have first class support for. In other words, it is the foundations for third-party utilities built up around dataclasses, for things such as UI presentation, database ORMs, serialization, and yes, even validation.

ilevkivskyi commented 7 years ago

I think the fact that static type checkers prohibit something like:

class C:
    x: int = ...
    y: int = ...

c = C('1', 1.1)

is rather good, not bad. What are the use cases for converters (apart form being temporary workarounds themselves)? As for validators, they can be added to __dataclass_post_init__ (I hope we will find a better name). Moreover, the latter can perform cross-field validation, so I agree with @ericvsmith here, we probably don't need validators and converters.

As for metadata, I don't have a strong opinion, but could imagine that it is indeed useful.

ericvsmith commented 6 years ago

metadata has been added.

@chadrik: where do you propose this documentation should go? Or is this just an exercise for the design phase, which I think has ended. It's not appropriate for this to go in the stdlib documentation.

chadrik commented 6 years ago

I think that attrs users are most definitely going to want this information once this project makes it into the stdlib. How about adding it to the wiki for now? I’ll gladly keep it up to date. I also want to use it to lobby for certain changes to attrs to increase compatibility (e.g. order vs cmp behavior). On Fri, Dec 1, 2017 at 10:44 AM Eric V. Smith notifications@github.com wrote:

metadata has been added.

@chadrik https://github.com/chadrik: where do you propose this documentation should go? Or is this just an exercise for the design phase, which I think has ended. It's not appropriate for this to go in the stdlib documentation.

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/ericvsmith/dataclasses/issues/60#issuecomment-348528283, or mute the thread https://github.com/notifications/unsubscribe-auth/AAD3E6j_nnE9uTo_770gRS0E_P-o0CTgks5s8B7rgaJpZM4QO1z0 .

ericvsmith commented 6 years ago

Either the Wiki (which I have no access to) or maybe under attrs' documentation (ditto).

And note that you can use dataclasses today, from PyPI, on 3.6. So let the lobbying begin, once the PEP is accepted.

ericvsmith commented 6 years ago

Also, note that attrs' these parameter is roughly equivalent to the dataclasses.make_dataclass() function. So I think the only real difference in your table is __slots__, validate, and convert. I deliberately don't want to support validation and conversion, instead leaving that to static type checkers (see https://github.com/ericvsmith/dataclasses/issues/60#issuecomment-342009693 above).

As for __slots__, that's a deliberate decision. Although I have another decorator which I'm not including in the PEP that adds __slots__ and returns a new class. See add_slots() in dataclass_tools.py in this repo. Because it's the only parameter that causes dataclass() to return a new class, I thought it was best to leave it out, at least for now. I'd like to make sure dataclass() is seen as something that just adds methods to a class, not returns a new class. Maybe that will change over time.

Tinche commented 6 years ago

I think that the "return a new class" approach is fundamentally incompatible with metaclasses and especially PEP 487. Since there is no way to add slots to an existing class, I'm considering a different API for slot classes in attrs too. Or, you know, Python could grow a better __slots__ interface itself, but I'm not holding my breath.

gvanrossum commented 6 years ago

Actually we should design a new slots interface. The original was designed before we had class decorators.

On Dec 2, 2017 12:17 PM, "Tin Tvrtković" notifications@github.com wrote:

I think that the "return a new class" approach is fundamentally incompatible with metaclasses and especially PEP 487. Since there is no way to add slots to an existing class, I'm considering a different API for slot classes in attrs too. Or, you know, Python could grow a better slots interface itself, but I'm not holding my breath.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ericvsmith/dataclasses/issues/60#issuecomment-348717142, or mute the thread https://github.com/notifications/unsubscribe-auth/ACwrMvziqyw8W_mLHtcHaBa-2yW1IKRLks5s8bA6gaJpZM4QO1z0 .

Tinche commented 6 years ago

Actually we should design a new slots interface. The original was designed before we had class decorators.

Yes please!

gvanrossum commented 6 years ago

That won't be easy though -- it means that the instance layout has to be made changeable after the class object has been created (which happens when the metaclass creates it -- before the class decorator runs). Mayby there are some folks on python-ideas interested in brainstorming on how to do this.

chadrik commented 6 years ago

One last effort on this topic:

I think the fact that static type checkers prohibit something like:
class C:
    x: int = ...
    y: int = ...

c = C('1', 1.1)
is rather good, not bad.

What if say, over half of the uses of C required converting a variable to int, and what if that conversion was not as simple as calling a builtin but also required an import from some other module? This doesn't seem like a question of correctness to me, but rather one of convenience. Very many classes in the real world perform some conversion of arguments within their __init__ methods, and unlike validators I don't see a good alternative for those who don't want to perform conversions all over their code instead of in one place. There's the possibility of casting and re-binding the attributes in __post_init__, but that would break static type-checking: for that to work the mypy plugin needs to integrate converter annotations into the __init__ annotations, which means dataclasses needs first class support for converters.

ilevkivskyi commented 6 years ago

@chadrik

What if say, over half of the uses of C required converting a variable to int

I think such situations are relatively rare (like legacy API or similar). And IIUC this use case is covered by a combination of InitVar and __post_init__:

@dataclass
class C:
    a: str
    b: str = field(init=False)
    _b: InitVar[bytes]
    def __post_init__(self, _b) -> None:
        self.b = convert_from_legacy_api(_b)

aa: str = 'a test'
bb: bytes = b'b test'

c = C(aa, bb)  # OK

And this will work well with static type checkers.

ilevkivskyi commented 6 years ago

(I think you started with a/b/_b and then continued with x/y/_y?)

Indeed :-) Fixed!

ericvsmith commented 6 years ago

I think there's nothing else to add here. Closing this issue.

EhsanKia commented 3 years ago

I honestly don't see how the dummy InitVar + extra var + __post_init__ is a Pythonic replacement to the simple and clean converter. And it's also, as far as I can tell, not a solution for frozen dataclasses.

Take this very simple and common dataclass

@dataclasses.dataclass(frozen=True)`
class Group:
    names: Sequence[str]

How do you insure names is not mutable itself? Normally, a simpler converter=tuple would do the job, but now, you have to do all sorts of hacks and object.__setattr__ and so on. None of it is pythonic, clean or user-friendly.

gvanrossum commented 3 years ago

It’s unpythonic to expect “deep” frozen-ness. A frozen object disallows attribute assignment but doesn’t care about modifying attribute values.

ericvsmith / dataclasses

differences / compatibility with attrs project #60