dart-lang / language

Design of the Dart language
Other
2.66k stars 205 forks source link

Optional auto-generation of operator== and hashCode on classes with const constructors #1121

Open Hixie opened 8 years ago

Hixie commented 8 years ago

It is a common Dart idiom to have a class with all-final fields, a const constructor, a toString, an operator==, and a hashCode function.

The latter two are boilerplate:

  @override
  bool operator ==(dynamic other) {
    if (other.runtimeType != runtimeType)
      return false;
    final T typedOther = other;
    return typedOther.field1 == field1
        && typedOther.field2 == field2
        && typedOther.fieldN == fieldN;
  }

  @override
  int get hashCode => hashValues(field1, field2, fieldN);

It would be nice if there was some syntax or shorthand we could provide that would automatically provide these.

cc @abarth

a14n commented 8 years ago

Something like https://github.com/a14n/zengen#value :)

anders-sandholm commented 8 years ago

Thanks, @a14n, for the link to https://pub.dartlang.org/packages/zengen @Hixie, @abarth, I'd be curious to know what you think about this.

abarth commented 8 years ago

Quite close, but the generated code isn't exactly what we'd want. We'd want

@DoTheThing()
class A {
  final a;
  final int b;
}

to generate

class A {
  final a;
  final int b;
  const A({this.a, this.b});
  @override bool operator ==(o) => identical(this, o) ||
      o.runtimeType == runtimeType && o.a == a && o.b == b;
  @override int get hashCode => hashObjects(a, b);
}

Also, we don't want to add a source-to-source translation step to our toolchain. We'd like something that the compiler and analyzer understand.

yjbanov commented 8 years ago

@abarth I think the proposed desugaring can only work cleanly under the following conditions:

Otherwise, if A is allowed to participate in any non-trivial class hierarchy, then the proposed implementation of operator== will violate Liskov substitution principle. For example, CartesianPoint { x; y } will not be operator== equal to CylindricalPoint { unitVector; radius; } even if they both implement the same Point interface.

abarth commented 8 years ago

Is there a better operator== function we should use? Here's a real one from package:flutter:

  @override
  bool operator ==(dynamic other) {
    assert(debugAssertIsValid());
    if (identical(this, other))
      return true;
    if (other is! BoxConstraints)
      return false;
    final BoxConstraints typedOther = other;
    assert(typedOther.debugAssertIsValid());
    return minWidth == typedOther.minWidth &&
           maxWidth == typedOther.maxWidth &&
           minHeight == typedOther.minHeight &&
           maxHeight == typedOther.maxHeight;
  }

https://github.com/flutter/flutter/blob/master/packages/flutter/lib/src/rendering/box.dart#L404

We'd be willing to lose the debugAssertIsValid check if we didn't have to actually write out the overload.

Hixie commented 8 years ago

@yjbanov A CartesianPoint isn't equal to a CylindricalPoint, they have different types.

Hixie commented 8 years ago

Incidentally, in Flutter's codebase we sometimes check for type equality using is and sometimes using runtimeType. There are some specific cases where is is the right answer because we want the Liskov substitution principle to apply (e.g. _DebugSize vs Size) but usually we actually want runtimeType equality. I've been slowly moving us towards the latter, but there's lots of cases where we still use the former because we didn't know better when we last did our big pass at making our operator ==s consistent.

The reason you want to use runtimeType is that otherwise foo == bar might give a different result than bar == foo (in particular in the case where foo is an instance of a subclass of the class bar is an instance of, and foo's class introduces new fields, foo == bar might be false even when bar == foo is true).

lrhn commented 8 years ago

@Hixie

I see that as a clear sign that operator== is badly designed in general. (It's not the only sign). It's just one of those functions that shouldn't be virtual to begin with, because overriding it in a subclass is pretty much impossible without breaking semantics.

If we had non-virtual methods, I would make operator== one and then a Set<Point> should only distinguish points by coordinates, even if some are actually ColorPoint instances. From the set's perspective, they are the same Point. A Set<ColorPoint> would distinguish by colors, but it also only sees ColorPoint instances. Then you could override operator== to only accept objects of the correct type, instead of having to accept Object and always do the is-check first and reject everything anyway.

(Another approach is to not have your instances extend each other, but have all of them extend a common abstract superclass, with equality only being defined at the leaves of the class tree, then you can use is checks, and still allow mocks to successfully emulate the instance classes. Basically: Never put an operator== implementation on a class that can be extended).

Hixie commented 8 years ago

We use operator == on complicated type hierarchies a lot. It works fine so long as you're testing the runtimeType for equality.

yjbanov commented 8 years ago

@abarth

Is there a better operator== function we should use?

I suspect there's no one implementation of operator== that's better than others in all situations, and that's the basis of my concern: if one particular implementation is baked into the language it may come back and bite us later. Such syntactic sugar provides an incentive to use it even in cases where it's not the right thing to do.

we don't want to add a source-to-source translation step to our toolchain. We'd like something that the compiler and analyzer understand.

What is the difference between a source-to-source translator and a compiler? It seems they both need to do the same amount of work and generate the same thing.

@Hixie

A CartesianPoint isn't equal to a CylindricalPoint, they have different types.

Types don't matter in my case. Both classes express the same thing. The choice between one type or the other is purely an implementation detail (perhaps they have different performance characteristics). If runtimeTypes are compared, this implementation detail will leak to the user.

abarth commented 8 years ago

What is the difference between a source-to-source translator and a compiler? It seems they both need to do the same amount of work and generate the same thing.

Maybe a more correct thing to say is that we want something that's fully integrated into the Dart toolchain and has zero impact on the development cycle. The existing source-to-source transformations don't have these properties but macro expansion in C, for example, does have these properties.

Hixie commented 8 years ago

If you're ok with adding a CartesianPoint to a Set and getting back a CylindricalPoint, then implementing your own operator ==/hashCode pair that does that should definitely be possible.

This bug is only about how to provide an operator ==/hashCode pair that does a straight-forward comparison of all the final fields of an immutable class (one with all-final fields, a const constructor, a toString, an operator==, and a hashCode function). We do this all the time in Flutter's codebase, and it would be nice to not have to worry about writing this boilerplate code all the time. We've also received requests for this from our customers.

yjbanov commented 8 years ago

@abarth

fully integrated into the Dart toolchain and has zero impact on the development cycle

👍. C's macro expansion happens at source text level and it can blow away the cache, hurting the dev cycle, but I get the point.

@Hixie

worry about writing this boilerplate code all the time

Understood. I have nothing against Flutter providing a boilerplate-free way of expressing this, but I would be concerned if the suggested implementation of the operator== was baked into the language as it would falsely imply that this particular implementation is the correct way in all circumstances.

Hixie commented 8 years ago

Oh certainly. As I noted above, even in Flutter's codebase we don't always want the same version. For example equivalent Size and _DebugSize objects want to be considered ==, whereas equivalent Offset and Size objects do not (these all directly or indirectly inherit from OffsetBase). All we're looking for here is a shorthand for a common case we see a lot. It may be that the common case here isn't common enough to justify dedicated syntax and instead there should be macro-like syntax. Or it could be that the common case here really is very common and it can be blessed, while still allowing less common cases to be implemented on a case-by-case basis.

yjbanov commented 8 years ago

I vaguely remember seeing a macro-like proposal that would rely on annotations to generate code. IIRC, the use-case was to generate JSON serializers/deserializers from fields and some sort of AOP use-case. Can't find it anymore. I think it was @sigmundch's doc.

a14n commented 8 years ago

https://github.com/dart-lang/dart_enhancement_proposals/blob/f433a56ba95f4750f0c795a1b89323be3ff3f1cc/Meetings/2015-05-06%20DEP%20Committee%20Meeting.md#interceptors-and-macros ?

yjbanov commented 8 years ago

Bingo! Go, @a14n!

jmesserly commented 8 years ago

for what it's worth -- C# 7 is going to have this. https://github.com/dotnet/roslyn/blob/features/records/docs/features/records.md

yjbanov commented 8 years ago

@jmesserly Makes sense as (if I'm reading correctly) they do this only for sealed classes and structs, both of which satisfy the conditions above

jmesserly commented 8 years ago

Yup, exactly. From the looks of it they designed record types to make immutable plain-old-data classes very easy & they also thought about how it integrates with pattern matching (https://github.com/dotnet/roslyn/blob/features/records/docs/features/patterns.md)

davidmorgan commented 8 years ago

FYI

Here is a source_gen based approach to this problem:

https://github.com/google/built_value.dart

source_gen satisfies the requirement to be visible to the analyzer.

built_value is modeled after Guava's @AutoValue. Re: the equality operator, we follow this from @AutoValue: https://github.com/google/auto/blob/master/value/userguide/howto.md#inherit -- and point to Effective Java, which explains that there is no good way to mix equality and inheritance.

abarth commented 8 years ago

source_gen satisfies the requirement to be visible to the analyzer.

Can you explain this a bit more?

davidmorgan commented 8 years ago

source_gen satisfies the requirement to be visible to the analyzer.

Can you explain this a bit more?

source_gen uses the build package:

https://github.com/dart-lang/source_gen https://github.com/dart-lang/build

The idea behind "build" is that when developing you run it as a watcher than continuously updates the generated source.

So from the point of view of your IDE and the analyzer there's no difference between generated and non-generated source.

It's a nice experience, you can navigate easily into the generated code and see what's going on.

abarth commented 8 years ago

Thanks. Unfortunately, that doesn't meet my requirement to have zero impact on the development cycle. It sounds like a good approach to have a small impact, but it's still nonzero.

davidmorgan commented 8 years ago

Yes, it's not quite as neat as a language feature.

I've given considerable thought to the possibilities around a language feature. Right now I think the source_gen approach is better.

If we add something as a language feature, it becomes very fixed/rigid, we can't iterate on it. So my intention is to build with source_gen in the hope of settling on something widely successful/popular. Then there will be a case for upgrading it to a language feature.

What I want is more than is described on this request, though. For an immutable class I like to also have a builder class, so you have a mutable version of the type that you can accumulate data in. This is good for readability and efficiency. This is what built_value, linked above, provides. I don't see a nice way to do this with a language feature.

abarth commented 8 years ago

Right now I think the source_gen approach is better.

The source_gen approach is not better for us because it does not meet our requirements.

davidmorgan commented 8 years ago

Thanks. I understand that source_gen does not meet your requirements. What I meant by "better" is in the short term: the source_gen approach is a realistic way to create interest/support for something before it becomes a language feature.

Put another way, if we first have a popular source_gen solution to the problem, there is a much stronger case for adding it as a language feature. I would also like to see something end up in the language at some point, but I'd be surprised if it happens soon, so that's why I'm working on the source_gen path.

abarth commented 8 years ago

You're welcome to head down the source_gen approach, but it's somewhat off-topic for this bug.

mockturtl commented 8 years ago

@yjbanov @a14n

I vaguely remember seeing a macro-like proposal

dart-lang/dart_enhancement_proposals#44

lrhn commented 8 years ago

Yes, equality (or, to be exact, "equivalence") is not a property of the object. but of the perspective. In mathematics, equality is really identity. If you work on semantic values, a point is a member of ℝ². You can't have "two points" that are both at (2, 2) - then it is the same point.

In Object Oriented programming, objects often correspond to "real" concepts and entities. That's why you can have a Cartesian point (x=2, y=2) and a polar point (r= √8, φ= π/4) which both represent the same underlying entity - the element (2, 2) of ℝ². If you look at those objects merely as points, they are "mathematically equal" - they represent the same entity. If both implement a Point interface, there is no reason to not accept them as equal Points - they are "equal modulo representation".

On the other hand, if the representation is important to you, you can have a function that expects either a CartesianPoint or a PolarPoint, and then the two are not equivalent to you. Your perspective has changed from the represented entity to its representation.

And even further, looking at the actual objects in the system, even two CartesianPoints with the same coordinates are not identical. That's the finest granularity of distinction allowed by the language. Objects have identity, that's a fundamental tenet of object orientation. That's both a blessing and a curse - it allows you to refer to the object (references only make sense if the thing you refer to is a specific thing). It allows mutability - you can't change something if it has no identity. It also prevents some optimizations because you can't be certain that you are the only one who can access an object - anyone with a reference can access the same object.

So, why do we have equals on some objects anyway.

For one thing, even if there are multiple perspectives on an object, in some cases one perspective is so much more common than any other that it makes sense to give it precedence. Numerical equality of numbers. Contents of strings. It's simply too annoying not to be able to compare two numbers by just writing n == 4 - the alternative is numericalEquality(n, 4), and it just doesn't roll of your tongue the same way. You can still declare an equality on integers that is equals modulo 2**32, it's just not the "natural" equality, and you have to represent it separately.

When you have an interface, it makes sense to declare a notion of equality on that interface, based only on the visible properties. That makes it possible for multiple implementations to agree on the equality. If you implement the interface by extending a base-class, you can even share the equality implementation. The num interface declares an equality that is shared by int and double.

You don't have to give a class a custom equality. Many classes work well with just the default identity - objects are equal only if they are the same objects (that's really the minimal equivalence relation: it is the minimal reflexive relation and that happens to also be trivially symmetric and transitive). Dart does not have equality on, say, List. We could have, but we have tried to not put equality on mutable objects. That's dangerous, and not always useful. It does mean that we lose out on the cases where easily comparing lists of integers would be practical, but that's a trade-off, not just an oversight.

So, back to equality on classes that can be extended. It's hard. If your subclass is significantly different from the superclass (e.g., Point vs ColorPoint) then you might not want to make them equal. However, a ColorPoint is-a Point, and from a Point perspective it might be equal to another point which isn't colored. Then you get either non-symmetry when a Point considers itself equal to a ColorPoint with the same coordinates, but not the other way around, or non-transitivity if the ColorPoint do consider itself equal to a plain Point with the same coordinates, but not equal to another ColorPoint with the same coordinates and a different color. It's just broken. Basically, if you need to overwrite the equality in a subclass, then it's probably not really an equality on the superclass's interface any more. It's two different equalities, and they are not necessarily compatible, and by putting them on the objects, you loose the ability to pick the perspective you want.

Using runtimeType to disallow equality with any subclass is, well, possible and consistent and prevents accidentally being equal to a ColorPoint, but it also breaks substitutability at its core and prevents you from being equal to a PolarPoint. I can no longer create a specialization of the class or a different implementation of the interface that can be used in all places where the superclass can. (Well, you can, because we allow you to override runtimeType too and lie about your type - yey?)

In summary: Don't put equals on mutable objects or objects likely to be extended. Only put it at the leaves of your class hierarchy, or pick one equality and use it for the entire subtree. And even then it will sometimes suck.

davidmorgan commented 8 years ago

Re: Comparator, I think the answer is the same: you can do this right now with codegen, so it doesn't make sense to pursue a language feature.

davidmorgan commented 8 years ago

I expect a source_gen based solution would look reasonable.

You can generate Comparator<Point>; using sourcegen you declare the interface you want then delegate to generated code. So, yes, the generated code will be PointComparator -- actually, `$PointComparator-- but you'll never see it, you'll seeComparator`:

part 'point.g.dart';

class Point {
  @Compare(#x)
  static final Comparator<Point> byX = _$byX;
  @Compare(#x, #y)
  static final Comparator<Point> byXandY = _$byXandY;

  int x;
  int y;
}

source_gen supports multiple generators on one file, no trouble.

davidmorgan commented 8 years ago

I hope so :) ... I think it's good enough for many problems, what's needed is for people to try it out and explore what works in practice. If we can suggest small language changes that make the codegen story better, that seems a good direction.

Hixie commented 8 years ago

Assuming the comparator is reified at compile time, I don't see why it wouldn't take the same amount of memory as today.

Hixie commented 8 years ago

I don't understand. How would it work with Comparator then?

Hixie commented 8 years ago

Oh, I see, you're basically passing in a method to the Map to do the comparison instead of using == and hashCode.

That doesn't help our general case, unfortunately. We have lots of classes in the Flutter framework that compare objects and rely on the operator == result, and thus lots of objects that have operator == implementations. This issue is just about requesting a way to do those without all the boilerplate we have today. If you want Map extended as you describe, I think that should be a separate issue.

Hixie commented 8 years ago

Sure. We can do that today, no need for any new syntax or anything. It's significantly uglier than just having an == that does the right thing though.

Hixie commented 8 years ago

That page just says it's hard (and then gives some Java-specific issues). Our experience is not that it's hard, only that it's tedious.

Ideally the solution wouldn't involve writing code for both hashCode and ==. Exactly how it is implemented isn't something I'm particularly concerned about, so long as it is fast and the syntax is brief, convenient, and readable.

lrhn commented 8 years ago

In package:collection I used Equality. Not perfect, it's not obvious that it includes the hash code, but it really is an implementation of an equality relation.

The Comparator is a noun phrase that describes an action, I'd expect that to be a function. The HashingStrategy sounds like an object, but doesn't describe the equality part. The Equarator is ... not a word. Sorry :) And Equator means something else already.

There are only two hard things in Computer Science: cache invalidation and naming things.

-- Phil Karlton

lrhn commented 8 years ago

@tatumizer wrote

In dart:collection I used Equality

In package:collection. You see how confusing it is?

Doh! :) Fixed