Closed thisisrandy closed 3 years ago
I think the problem is simply that I forgot to set internals
to True
when generating __tuple__
. Is there a situation you would ever not want internal fields to be included in comparisons?
For your question about the performance penalty of as_tuple
, I've never measured it, but I'm sure it's significant. If you look at _recurse_structure
, it does a lot of work in unpacking and copying arbitrarily nested structures, most of which is irrelevant or actually harmful for the purposes of comparison. The analogy of as_tuple
to the comparison methods I guess is technically wrong, but I thought it was accurate enough, if I had remembered to set fields=True
.
I think the problem is simply that I forgot to set internals to True when generating tuple. Is there a situation you would ever not want internal fields to be included in comparisons?
I've definitely written classes where I keep e.g. derived state or things like timestamps that I might want to ignore vis-a-vis equality in internal fields, but I'm not sure that any of them were also dataclasses, i.e. fit the model (from PEP 557) of "classes which exist primarily to store values which are accessible by attribute lookup."
In lieu of a specific use case, I can see how making this configurable might be viewed as just cluttering the API. It sounds like this qualifies as a bug from your perspective, so I'd be just as happy to see it fixed by always passing internals=True
to fields
. However, I'll caution, as I did in #54, that it would be a breaking change, and that, intentionally or not, one or your tests actually relied on the bug. If you're confident that none of your users are relying on it (probably unknowable, I realize, but confident enough, in any event), then I'm happy to make the change and fix the test.
intentionally or not, one or your tests actually relied on the bug
Where does this happen? I see you applied the new option to Epsilon
so I assume it's to do with that.
@dataclass # a complex (nested) class
class Epsilon:
g: Tuple[NT]
h: List['Epsilon']
_i: int = 0
self.e = self.Epsilon((self.NT(1, 2, 3)), [self.Epsilon(4, 5, 6)])
def test_values(self):
"""Test values()."""
self.assertEqual(values(self.e), dict(g=self.NT(1, 2, 3), h=[self.Epsilon(4, 5)]))
self.assertEqual(values(self.e, True), dict(g=self.NT(1, 2, 3), h=[self.Epsilon(4, 5)], _i=0))
Once Epsilon._i
is considered, both assertions break.
Ah OK, I see what you mean. This may be a stupid idea, but I kind of wonder if the comparison behaviour should be tied to hide_internals
. When running that test I was confused for a minute because the repr
s of those two objects are identical, even though they compare differently. "Forcing" the user to render visible the reason for two objects being different might be a good idea. As an added bonus, hide_internals
is True
by default, which would match the current behaviour, making it less of a breaking change.
I originally said I thought they should not be tied together, but after getting briefly confused about why the test was failing earlier (because I initially missed the _i=6
in the nested Epsilon
and it wasn't showing up in the repr
), I'm warming up to the idea.
However, I don't think "hide" is the right verb for the semantics of not using internals in comparison. How about deprecating hide_internals
and aliasing it to use_internals
?
As an added bonus,
hide_internals
isTrue
by default, which would match the current behavior, making it less of a breaking change.
One partial definition of "breaking change," in my view, is any API change that isn't strictly additive. Even if the default value of hide_internals
doesn't change, the proposition here is changing its semantics, which, unfortunately, still qualifies as breaking :disappointed:. If we believe that internals should be considered in comparison by default, regardless of how that's configured, there's no way to avoid it breaking the one person out there who wrote assert Episilon(4, 5, 6) == Epsilon(4, 5)
at the top of their munge_greeks
method
However, I don't think "hide" is the right verb for the semantics of not using internals in comparison.
Surprisingly, "hide" makes perfect sense to me here, in the sense of "to obscure from the outside world".
__iter__
, __repr__
, __eq__
, __lt__
etc.Maybe it is still not absolutely clear, but I can't see any verb being so brilliant that it needs no explanation. I dislike use
as it's too vague.
One partial definition of "breaking change," in my view, is any API change that isn't strictly additive.
Yes, I was very loose with my usage of my term, what I really meant was "reducing the amount of code out there actually broken by this change." If we linked comparison etc. to hide_internals
being false, it would only affect people who set it to true and had internal fields and didn't want them involved in the comparison. Which is still possibly some people but not as many as if we were to change the default behaviour (and this would really be a pain to diagnose if you don't follow dataclassy's release notes).
I think this should be
True
by default so as to not be obscurely inconsistent with dataclasses.
I actually thought of at some point changing hide_internals
to be False
by default for this reason, but this would be a separate breaking change.
Surprisingly, "hide" makes perfect sense to me here, in the sense of "to obscure from the outside world".
I don't want to get into a capricious semantics argument here, but (ha) hidden or obscured things can still very much and often do exert effects. That said, I agree that "use" is also insufficient, and this is ultimately your package, so as long as the docs are explicit, I'm happy with whatever you choose.
As an added bonus,
hide_internals
isTrue
by default, which would match the current behaviour, making it less of a breaking change.
I think I may have misinterpreted this. Were you saying that, as hide_internals
is true by default, tying it to __tuple__
generation doesn't change the default behavior? You would of course be right, though that conflicts with the idea of considering internals in comparison by default, which is probably whence my confusion.
To sum up, I'm fine with tying this behavior to an existing config variable, provided its effects are well documented. I would like for it to compare internals by default, because that follows the behavior of the model (dataclasses), and because not comparing internals seems unlikely to have been a relied upon feature, but I understand if you want to introduce it as a non-breaking change.
Were you saying that, as
hide_internals
is true by default, tying it to__tuple__
generation doesn't change the default behavior? You would of course be right, though that conflicts with the idea of considering internals in comparison by default, which is probably whence my confusion.
Yes. My rough plan based on this discussion is
1) Tie hide_internals
to __tuple__
in v0.11
2) Likely in v0.12, change hide_internals
to be False
by default
I think breaking it down like this is useful because it's easier for someone to predict how the behaviour of their code will change.
Sounds great, thanks. I made a PR for your convenience
Per the docs (emphasis mine),
However, it appears that internals are ignored, just as they are in
__repr__
whenhide_internals=True
:This may have been intentional, but it is also surprising, given that
and especially given that I can do this:
which just feels inconsistent to me (
__repr__
may choose to hide some internal state, so(a == b) == False
but(repr(a) == repr(b)) == True
seems fine, but not the inverse, as above).As for the why, it appears that at
dataclass.py:147
, insideDataClassMeta.__init__
, you decline to passinternals
tofields
, so the default ofFalse
is used when creating the expression for__tuple__
, which is of course what is used inside of__eq__
.However,
as_tuple
has no concept of internals and in general functions quite differently.I see two ways to fix this:
internals=False
tofields
atdataclass.py:147
, or make it configurable with a decorator param, something likeconsider_internals_eq=True
. I think this should beTrue
by default so as to not be obscurely inconsistent withdataclasses
. I don't think this should be tied tohide_internals
, since one may not want internals returned from__repr__
but still want them considered in__eq__
.as_tuple
in__eq__
. This seems right because it unifies two things that seem intuitively the same. However, your comments mention "efficient representation for internal methods" re:__tuple__
. I'm not sure offhand how much more efficient evaluating a static expression is v. recursing through a structure is, but if the difference is significant, maybe unification isn't the way to go.Please let me know which you prefer and why.