ericvsmith / dataclasses

Apache License 2.0
587 stars 53 forks source link

replace() works poorly with init=False fields and __post_init__ functions #74

Closed ericvsmith closed 6 years ago

ericvsmith commented 6 years ago

The module-level replace() function is defined as taking a dataclass instance and a number of replacement values (called changes). A new object is created, passing to its __init__() the value of fields from the original object or from changes, if they're given.

The question then is: what to do with fields that are init=False? Since they're not passed in to __init__, there's no way to specify them on the new instance. Currently I have post-create logic which says that for each init=False field I set the new object's value to the original object's value.

This works great, except where the init=False field is initialized in __post_init__. If there is a __post_init__, then any value set there is overwritten by my post-create logic. For example:

@dataclass
class C:
    i: int
    j: int = field(default=None, init=False)

    def __post_init__(self):
        self.j = 10 * self.i

c = C(1)
print(c)
print(replace(c, i=2))

Produces:

C(i=1, j=10)
C(i=2, j=10)

That is, even though the new object's j field is calculated in __post_init__ to be 20, it's overwritten by the post-create logic which copies the original object's j value of 10 over the new object's value.

The only thing I can think of doing here is to change the post-create loop to say that it should only copy the original object's value if the new object doesn't have an attribute set for theinit=False field. But I'm not sure how I can detect that, since all init=False fields must have a default value, so it actually is set on the new object, even if it's never touched by __post_init__.

Maybe we could just raise an error if replace() is called on any dataclass that has an init=False field and an __post_init__ method, since there's no way to know what the correct logic should be.

So:

  1. Do nothing, leave the post-create logic as it is now. I don't like this.
  2. Try to figure out if the init=False fields have been set in __post_init__, and if not copy them from the original. Probably not possible.
  3. Leave the current post-create logic, but make it an error to call replace() on a dataclass that has any init=False field and also has a __post_init__ function. This is overly restrictive, but at least prevents the problem outlined here.
  4. Drop the post-create step. It looks like this is what attrs does. Also, it doesn't require that an init=False parameter has a default value, which adds other wrinkles.

Any other approach that I'm missing?

[Edited to add option 4]

ilevkivskyi commented 6 years ago

A possible option is to add a hidden kw-only argument to generated __init__, let's call it _root, which is False by default. If it is true, then __post_init__ will not be called in generated __init__. Instead, it can be called manually in replace() after copying fields:

@dataclass
class C:
    x: int
    def __post_init__(self):
        ...

C(x=42, _root=True)  # __post_init__ is not called!

We have been using similar pattern for NamedTuple for some time in typing, it seems to be OK.

ericvsmith commented 6 years ago

For what it's worth, this is how attrs behaves, showing they don't do anything with init=False fields:

import attr

@attr.s
class C:
    i = attr.ib()
    j = attr.ib(init=False)

    def __attrs_post_init__(self):
        if self.i == 1:
            self.j = 10 * self.i

c = C(1)
print(c)
c1 = attr.evolve(c, i=2)
print(hasattr(c1, 'j'))
print(c1)

Produces:

C(i=1, j=10)
False
Traceback (most recent call last):
  File "attr-test.py", line 16, in <module>
    print(c1)
  File "/cygdrive/c/home/eric/local/venvs/attrs/lib/python3.6/site-packages/attr/_make.py", line 814, in repr_
    for name in attr_names
  File "/cygdrive/c/home/eric/local/venvs/attrs/lib/python3.6/site-packages/attr/_make.py", line 814, in <genexpr>
    for name in attr_names
AttributeError: 'C' object has no attribute 'j'
ericvsmith commented 6 years ago

After giving this a good bit of thought, I'm leaning toward 4 above: drop the post-create step in replace(). The new design woudl be that replace() will just create the new object by calling the class's __init__() with the right params (which will call __post_init__() as usual), and then return that object. Also, I think it's correct to drop the current requirement that init=False fields have a default or default_factory. This is the approach that attrs takes.

My reasoning is that it's not dataclass's job to make sure that the created object has all of the fields set, it's the class author's job. If it's possible that a code path through __init__ and __post_init__ leaves a field unset, that's a bug in the class definition. Or maybe it's not a bug, depending on the requirements for that class. Trying to guess the author's intent is just not feasible. Consenting adults and all.

After all, if the class has some wacky creation rules that might leave some fields unset, then that's what __post_init__ is for: you can have your own post-creation verification and/or additional field setting logic.

ericvsmith commented 6 years ago

Closed via #76.

gvanrossum commented 6 years ago

Interesting development. It seems to assume that init=False fields are only useful as fields computed by __post_init__. But the example from the PEP (a computed area field) only works for immutable (really "frozen") objects, or if users have the discipline to maintain the invariant manually or call the appropriate setter method. Then again I have no concrete example of why one would use init=False fields anyways.

ericvsmith commented 6 years ago

Good point. I'll change the PEP example to be a frozen class.

It seems to me there's no right behavior for how replace() should work. Maybe I'll add some wording that if you're writing a class for which the default replace() behavior doesn't work, you should write your own replace() member function and publicize that.

gvanrossum commented 6 years ago

Sounds good. I guess init=False fields should be rare.

ericvsmith commented 6 years ago

On second thought, I'm not going to change the example in the PEP, although I've added some wording about init=False fields.

The reason I'm not going to make it a frozen class is because setting attributes in __post_init__() for frozen classes is complicated, and I think it takes away from the point of the example. It would become:

@dataclass(frozen=True)
class Square:
  length: float
  area: float = field(init=False, default=0.0)

  def __post_init__(self):
      object.__setattr__(self, 'area', self.length * self.length)

Which seems to bury the point of the example.

gvanrossum commented 6 years ago

Sounds good.