dart-lang / sdk

The Dart SDK, including the VM, JS and Wasm compilers, analysis, core libraries, and more.
https://dart.dev
BSD 3-Clause "New" or "Revised" License
10.27k stars 1.58k forks source link

`hashCode` for immutable objects should be memoized #48948

Closed gaaclarke closed 2 years ago

gaaclarke commented 2 years ago

I was profiling the Flutter Gallery and noticed that Locale.hashCode (as in dart:ui.Locale) was taking a significant amount of time because the Flutter Gallery keeps a LinkedHashMap<Locale, DisplayOption>. Locale is intended to be immutable so there is no reason to keep calculating its hashCode. It would be nice if we could avoid recalculating hashCode in this case.

Reproduction code

import 'dart:collection';

class Foo {
  const Foo(this.x, this.y);

  final int x;
  final int y;

  @override
  int get hashCode {
    print('${identityHashCode(this)}');
    return x.hashCode ^ y.hashCode;
  }
}

void main() {
  Map<Foo, int> values = {};
  values[Foo(1, 100)] = 2;
}

Expected output

hashCode is called once

Actual output

hashCode is called twice

gaaclarke commented 2 years ago

It's a perfectly rational decision to say this falls on the shoulders of the developer, just throwing it out there as a suggestion as something we might be able to do to give a boost to many apps.

gaaclarke commented 2 years ago

Related, if we memoized hashCode we could calculate them at build time for string literals.

mkustermann commented 2 years ago

I think there's two things here:

Marking this issue as area-vm for the performance optimization.

lrhn commented 2 years ago

Generally we can't do language promises about hashCode since it's a user-implementable getter. It can do anything (and occasionally does, e.g. in mocks).

Usually, letting the code cache the value is safer than trying to optimize in the compiler for code reading the same getter twice, since we don't know why it does so (it might be a mutable object that has been mutated, that's completely invisible to the type system).

dcharkes commented 2 years ago
  • The VM's default map implementation is calling the hashCode getter twice when inserting elements (dart2js's implementation calls it only once). We should optimize this!

Fix: https://dart-review.googlesource.com/c/sdk/+/243623

gaaclarke commented 2 years ago

Yea, that makes sense that it would be tricky. As for calculating hashCode for literals at compile-time, it's a good idea but I have no evidence that it is a good use of our time. I'll just consider this closed after the duplicate call to hashCode is fixed. That will possibly help Flutter Gallery, thanks everyone.

dnfield commented 2 years ago

Having hash code change is almost certainly a bug.

This is why the lints to only override hash code on immutable objets exist.

gaaclarke commented 2 years ago

Having hash code change is almost certainly a bug.

Yea, I think what @lrhn was saying was we can't know for certain what people are doing in hashCode, so memoizing it is technically a breaking change. Even if it improved all of our users practically, a problematic use-case can block the change, no matter how unlikely it is. That's the unfortunate position Dart has put itself in.

(it might be a mutable object that has been mutated, that's completely invisible to the type system).

The premise for the change was that it would happen for immutable objects. In the case I was looking at Locale in Flutter it is a const object that can't be changed and that is knowable at compile-time.

dnfield commented 2 years ago

So let me put this differently:

We already have the problem @lrhn is talking about when using sets or maps, for example

void main() {
  final fooSet = <MutableFoo>{};
  MutableFoo a = MutableFoo(1);
  MutableFoo b = MutableFoo(1);
  MutableFoo c = MutableFoo(2);

  fooSet.add(a);
  print(fooSet);

  a.i = 2;
  fooSet.add(b);

  b.i = 2;
  print(fooSet);

  fooSet.add(c);
  print(fooSet); // {MutableFoo(2), MutableFoo(2), MutableFoo(2)}

}

class MutableFoo {
  MutableFoo(this.i);

  int i;

  @override
  int get hashCode => i;

  @override
  bool operator==(Object other) => other is MutableFoo && other.i == i;

  @override
  String toString() => 'MutableFoo($i)';
}

Gets you a set where its three elements all currently have identical hash codes and evaluate as equal.

gaaclarke commented 2 years ago

I tried to implement the memoization in the user code and in my case isn't actually possible to implement because we don't allow late fields on const classes, filed an issue: dart-lang/language#2225

gaaclarke commented 2 years ago

I tried to implement calculating the hashCode in the constructor and saving it and it isn't possible because Dart doesn't support any procedures in the initializers of const constructors. (Also accessing static const Maps is verboten which would have been an alternative for me).

lrhn commented 2 years ago

The way to store values on const classes is an Expando. That's also the reason we can't assume even immutable objects being unchanging.

To cache the hash code, do:

class MyConst {
  final something;
  final other;
  const MyConst(this.something, this.other);

  int get hashCode => _hash[this] ??= Object.hash(something, other);
  bool operator==(Object other) => 
      other is MyConst && something = other.something && this.other = other.other;
  static final _hash = Expando<int>();
}

There are multiple ways this can break. First of all, you don't know the object is deeply immutable. You can do non-const instantiations of the class, like new MyConst(mutableSomething, mutableOther). That object may need to change its hash code if its members change.

Also, even a deeply immutable object can pretend to be mutable using an Expando.

class MyTrickyConst {
   final something;
   MyTrickyConst(this.something);

   static final Expando<Object> _other;
   Object? get other => _other[this];
   set other(Object? value) { other[this] = value; }

   int get hashCode => Object.hash(something, other);
   bool operator==(Object other) =>
       other is MyTrickyConst && something = other.something && this.other = other.other;
}

This object may be deeply immutable when created using const, but the language or compiler assuming that means the hash-code doesn't change would be unsound.

That means that event if the compiler could determine with 100% certainty that an object is created by a const operation, it can't know that the hashCode won't change. It has to know the entire object graph, and check that every relevant object's hashCode only uses "safe" functions to combine the hash codes of other objects with unchanging hash codes. Very non-trivial analysis.

You can optimize your own hashCode. You cannot make assumptions about other classes, because the language features of Dart ensures that nothing can actually be assumed about an object's run-time behavior from just its static type.

gaaclarke commented 2 years ago

Closing this issue since the fix to double call to hashCode has landed and it has been shown that given how Dart works memoizing without direction from the user would not be possible.

gaaclarke commented 2 years ago

@dcharkes the CL for only calling .hashCode once seems to have dropped our build benchmark 10%: https://flutter-flutter-perf.skia.org/e/?end=1651961508&queries=sub_result%3Dstock_build_iteration%26sub_result%3Dstock_build_iteration_probability_5pct

Nice work. I'm not sure if anything else in that roll could have contributed also to that drop.

dcharkes commented 2 years ago

@gaaclarke we reverted the day after because it made b/230945329 much more likely to trigger, and will reland as soon as that bug is addressed. So if the build-time stayed down it wasn't this bug (unless no rolls happened that include the revert).

mraleph commented 2 years ago

As I have mentioned in the chat the 10% improvement is more likely coming from d856e058ea764fade95fdf76530c741079bd63e1

dcharkes commented 2 years ago

This has landed.

This speeds up map copying ~5%-15% in various configurations.

Thanks for reporting @gaaclarke!