dart-lang / sdk

The Dart SDK, including the VM, JS and Wasm compilers, analysis, core libraries, and more.
https://dart.dev
BSD 3-Clause "New" or "Revised" License
10.09k stars 1.56k forks source link

proposal: `Object.equals()` to `==` what `Object.hash` was to `hashCode ` #50544

Open bernaferrari opened 1 year ago

bernaferrari commented 1 year ago

Many Flutter classes override == and hash. Hash was solved with Object.hash, so why == is still so verbose?

I know the current "understanding" is that data classes or macros are eventually going to solve this, but an Object.equals could make everyone's life easy, now. There are classes, such as the new SegmentedButtonThemeData, where operator == accounts to 10% of the LOC:

image

This code is copied and pasted everywhere. Most people don't even know what identical does, they just copy and paste. The package Equatable has a great equals implementation. I believe Dart could made this function part of Object.equals and allow people to remove a buuuunch of code. I'm sure, if the implementation is good enough and performant enough, its author would give to Dart.

Flutter has 215 places of "duplicated" code that could be somehow improved/replaced/cleaned-up. More than 3k LOC. Possibly 4k LOC. The super parameters feature has cleaned-up about 3k. Could be about the same size.

image
matanlurey commented 1 year ago

I think the biggest problem is, without macros, it's not clear what this function would do:

// Attempt 1
class Object {
  static bool equals(
    Object? a1, 
    Object? a2, [
    Object? b1 = _sentinelValue,
    Object? b2 = _sentinelValue,
    // Imagine this continuing up to 20 pairs of objects, similar to Object.hash
  ]) {
    if (identical(a, b)) {
      return true;
    }
    if (_sentinelValue == b1) {
      assert(_sentinelValue != b2);
      return b1 == b2;
    }
    return a1 == a2;
  }
}

I think the biggest problem is this is awkward to use, there isn't a clear association between a1 and a2. Maybe with tuples (records?) this makes more sense:

// Attempt 2
class Object {
  static bool equals(
    // I did not read the spec, so substitute the real syntax if this is wrong.
    (Object?, Object?) a, [
    (Object?, Object?) b =  _sentinelTuple,
    // Imagine this continuing up to 20 pairs of objects, similar to Object.hash
  ]) {
    if (identical(a, b)) {
      return true;
    }
    if (_sentinelValue == b) {
      return b.$0 == b.$1 && a.$0 == a.$1;
    }
    return a.$1 == a.$2;
  }
}

The final non-macro solution is a secret intrinsic function, but I'll defer that to the Dart team.

mraleph commented 1 year ago

FWIW the function does not have to be implemented in Dart, e.g. you could probably specify it in the following way:

external bool _haveSameRuntimeType(Object? obj, Object? bar);

/// Returns the list of values stored in fields of object obj.
/// The order is unspecified but consistent for instances of the 
/// same class. 
external List<Object?> _valuesOf(Object? obj);

bool shallowEquals(Object? a, Object? b) {
  if (identical(a, b)) {
    return true;
  }

  if (!_haveSameRuntimeType(a, b)) {
    return false;
  }

  final aValues = _valuesOf(a);
  final bValues = _valuesOf(b);
  for (var i = 0; i < aValues.length; i++) {
    if (!identical(aValues[i], bValues[i])) {
      return false;
    }
  }
  return true;
}

bool deepEquals(Object? a, Object? b) {
  if (identical(a, b)) {
    return true;
  }

  if (!_haveSameRuntimeType(a, b)) {
    return false;
  }

  final aValues = _valuesOf(a);
  final bValues = _valuesOf(b);
  for (var i = 0; i < aValues.length; i++) {
    if (aValues[i] != bValues[i]) {
      return false;
    }
  }
  return true;
}

The actual implementation will have to depend on native vs JS vs Wasm though. I think Wasm is the problematic one here - because I don't think you can iterate over struct fields. Also the speed is not going to be great in JS either - at least if you write it reflectively through Object.keys.

You would need to have compiler to produce specializations for pattern like

bool operator == (Object? other) => Object.equals(this, other);

So maybe the right approach is to say that Object actually has external bool fieldsEqual(Object? other) and external bool fieldsDeepEqual(Object? other); methods. Then compiler can synthesise overrides for those where necessary. Unfortunately incorrect use has potential to blow up the code size for something like Wasm where reflective field access is impossible.

/cc @lrhn

bernaferrari commented 1 year ago

Great discussion. From my understanding reading your comments, Equatable package would fail in wasm?

lrhn commented 1 year ago

I'm not entirely sure what the original post is asking for. Less code, definitely, but which and how.

The Dart == operation is completely abstract. Classes can implement it any way they want to. That means that we can never assume that one particular choice is the correct one.

I'd love to remove the if (identical(this, other)) return true;, like we removed the need to check for null. The only problem is that there is an object which is not equal to itself: double.nan. That was the reason == did not check its operands for identity before calling operator==, like it does checking for null values. That reason still holds.

Always doing runtimeType checks in an equality is a bad design. It prevents having different representations of the same concept, which are equal. Say, points represented using either Carthesian or polar coordinates. And it gets in the way of mocks. I never add runtimeType checks to == in my own classes. Having a default implementation which does that check makes it potentially not as generally useful.

(For example Uri has a subclass _SimpleUri which is used for particularly simple inputs. A non-simple Uri instance may end up being equal to a simple URI, because the choice depends on how the object is created, not the final result.)

Then there is the comparison of properties. It's hard to make that any shorter than

  return this.foo == other.foo && this.bar == other.bar && this.baz == other.baz;

If int get hashCode => foo.hashCode ^ bar.hashCode ^ baz.hashCode; had actually been a good hash code, I don't think we'd have had Object.hash. We have Object.hashCode because it encourages using a better algorithm, one which would be much more verbose than the direct code here. The == comparisons are not that verbose, and therefore they are harder to improve on.

The one option I can see with less repetition would be:

  return compareValues(this, other, [(x) => x.foo, (x) => x.bar, (x) => x.baz]);

where compareValues could then even be:

bool compareValues<T extends Object>(T value, Object other, List<Object? Function(T)> properties) {
  if (identical(value, other)) return true;
  // Add `if (value.runtimeType != other.runtimeType) return false;` if you want that.
  if (other is! T) return false;
  for (var property in properties) {
    if (property(value) != property(other)) return false;
  }
  return true;
}

That's a significantly larger overhead (unless we manage to completely inline the call, unroll the loop, and inline the property extractor functions).

personally, I'd probably make static getters for each field, so I can reuse the same list every time, instead of creating a new list with new function objects on every == call, so:

  static Object? _getFoo(MyClass o) => o.foo;
  static Object? _getBar(MyClass o) => o.bar
  static Object? _getBaz(MyClass o) => o.baz;
  bool operator==(Object other) =>
     compareValues<MyClass>(this, other, const [_getFoo, _getBar, _getBaz]);
}

That reduces the overhead and memory churn, but adds more boilerplate, which is the opposite of what we are trying to do. Neither is more readable than the original. Can't say how big the code saving will be for a large project, though.

Then there is talk about automatically extracting the properties from the object, without having to name them, like the _valuesOf function, or having Object.equals doing it.

That's reflection. And it potentially exposes implementation details.

The first issue is that not all fields are necessarily part of deciding whether two objects are equal.

If a class hides a private field with a unique ID, which isn't used for equality, just for some internal book-keeping, or it has a cache variable for some other method, it should not be visible to anyone outside of the class (or at least outside the library), because you should not be able to see implementation details from the outside.

That means that an automatic Object.equals(o1, o2) which reflectively extracts field values (on objects with the same runtime type), can return false for objects where their own type would consider them equal. A hidden cache, which is only filled on one object, is enough to make an automatic "compare all fields" be wrong. Or storing multiple values in single field, combined into a single wrapper object (which might make that combination of values easier to share between objects), but not having operator== defined on that wrapper object. (That sounds obscure, but it's true for every list or map field.)

You can't see from the outside whether all fields values, or the equality of those field values, should make objects be considered equal. The class interface is an abstraction level, and looking below that breaks the abstraction, which can expose implementation details and give unpredictable results.

Another significant issue with those interfaces is that they do not expose fields at all. They expose getters. Extracting the values of fields, but not calling every getter, is another way it breaks the abstraction. From the outside there should be no difference between two, and a class should be able to switch between a getter and a field without affecting anyone.

Basically, code cannot assume anything about the internal state of any class other than its own class. Possibly classes in its own library. Even a superclass can change a field to a getter, or add internal fields or caches or wrappers objects, at any time.

So if you can extract "all fields" in any way, it should only every apply to fields declared in the same class (or at most direct superclasses in the same library, where you can see all the classes and their extends clauses). If a class has a superclass from another library, any access to that superclass goes through the interface, which does not allow you to distinguish fields from getters.

So _valuesOf should only be callable from inside the same library as the runtime type of the object it's called on, and it should only expose the fields of that class, not ones inherited from superclasses (or from superclasses in other libraries). Which means that it cannot, and must not, do anything you cannot do manually be adding a List<Object?> get _values => [foo, bar, baz]; to access the same values. All you save is typing, and at the cost of a weird, reflection-like functionality that can easily cost extra in AoT code, if you can't figure out which types it's called on.

So, no automatic extraction of fields!

If you want to reduce boilerplate, I'd just introduce

bool? compareObjectsType(Object o1, Object o2) {
  if (identical(o1, o2)) return true;
  if (o1.runtimeType != o2.runtimeType) return false;
  return null;
}

and use it as:

bool operator==(Object other) =>
  compareObjectsType(this, other) ?? other is MyType &&
     foo == other.foo &&
     bar== other.bar &&
     baz == other.baz;
}

You can do the "list of property functions" too, but I think it's overhead makes it a potentially problematic design.

rakudrama commented 1 year ago

For two objects to be equal they must behave the same. I don't think you can generally determine this without asking both objects if they agree that the other object is equal. Let us posit a instance method predicate called trusts which determines if the object would accept the other object being used in its place. Then equality becomes symmetrical.

abstract class GoodEquals {
  bool operator ==(Object other) => other is GoodEquals && this.equals(other);
  bool equals(GoodEquals other) => this.trusts(other) && other.trusts(this);
  bool trusts(GoodEquals other);
}

class Point extends GoodEquals {
  int x,y;
  bool trusts(GoodEquals other) => other is Point && x == other.x && y == other.y;
}

Now let's say in a new library or package we implement Point in various ways. A ColoredPoint is not the same as a Pointbecause the ColoredPointsays so. A CachingPoint has different fields, but can be used wherever a Point can be used. From a method solely on Point there is no way to distinguish these two cases. e.g. using .runtimeType.

class ColoredPoint extends Point {
  int color;
  bool trusts(other) => other is ColoredPoint && this.color == other.color && super.trusts(other);
}

class CachingPoint implements Point {
  int? _x;
  int get x => _x ??= computeX();
  ...
  bool trusts(other) => other is Point && this.x == other.x && this.y == other.y;
}

A problem here is that we check both a.x == b.x and b.x == a.x. If the 'properties' are also GoodEquals objects, the cost becomes exponential in the structural depth. This might be solved by independently checking which properties are available (the shape), and checking the property values.

abstract class GoodEquals {
  bool equals(GoodEquals other) =>
      this.trustsShape(other) && other.trustsShape(this) && equalProperties(other);
  bool trustsShape(GoodEquals other);
  bool equalProperties(covariant GoodEquals other);
}

class Point {
  ...
  bool trustsShape(other) => other is Point;
  bool equalProperties(/*covariant*/ Point other) => this.x == other.x && this.y == other.y;
}

class ColoredPoint extends Point {
  ...
  bool trustsShape(other) => other is ColoredPoint;
  bool equalProperties(/*covariant*/ ColoredPoint other) =>
       this.color == other.color && super.equalProperties(other);
}

class CachingPoint implements Point {
  ...
  bool trustsShape(other) => other is Point;

  bool equalProperties(/*covariant*/ Point other) {
    // Try not to compute `x` unless necessary
    if (_x == null ||
        other is CachingPoint && other._x == null) {
      return y == other.y && x == other.x;
    }
    return x == other.x && y == other.y;
  }
}

How much of this can be automated? Perhaps something like:

  1. Users can define trustsShape and / or trustsProperies.
  2. A default implementation of trustsShape for class C or C<T> is => other is C or => other is C<T>.
  3. A default implementation of equalProperties compares the public getters of the class tested in trustsShape.
lrhn commented 1 year ago

Alternatively, equality should just not be a virtual method. Don't ask whether a CachingPoint thinks it's equal to a ColorPoint. Ask whether the two are equal as Points, if that is the equality you care about. If all you know and use about the objects is that they are Points, you don't need to distinguish them by color.

Dart is not well geared to that. C# has static operators in general, including ==, and can access == through generic type parameters too. That's probably what it would take to do the same thing in Dart.

bernaferrari commented 1 year ago

Swift did something like this with their "data classes" using macros.

felangel commented 4 months ago

Hi author of package:equatable here 👋

Just chiming in to say I'd be more than happy to help however I can 😄 I've just started experimenting with how package:equatable could look using macros ↓

https://github.com/dart-lang/sdk/assets/8855632/2114ec6b-febe-42f0-a31a-3f46e0860435