Open mkustermann opened 1 year ago
Disclaimer: I do understand that semantics are a little bit different in that DeepCollectionEquality
does allow a different equality on keys than the actual Map
implementation will use. But this seems to be a rather rare use case (if it's useful at all?) - and one shouldn't pay with 70x slower performance due to this by-default.
Might be something we can optimize, if we can find a safe algorithm.
The issue with MapEquality
itself is that we don't know the equalities used by the maps, so we have to treat them as unordered sets of key/value pairs, and check that those sets are equal according to _keyEquality
/_valueEquality
pair comparison.
Not only do we not know that the key/value equality is default ==
equality (we could detect that and special-case, and it even be worth it), but we also don't know that the maps use that as their equality (could be identity maps, could be splay-tree maps with a different ordering, could be anything.)
We can't use keys from one map as lookups in the other. That's a serious restriction on what we can do, and it might be worth adding a more efficient SameMapEquality
that assumes the two maps use the same key-equality, and uses that equality for the comparison too, so:
/// Equality on maps which assumes that both maps use `keyEquality` for key equality.
///
/// More efficient than [MapEquality] when doing `equals`, because [MapEquality.equals]
/// has to compare the key/value pairs of each map independently of the maps' own
/// equality behavior. This class assumes that compared maps use an equality which is
/// compatible with the `keyEquality`, and can therefore reliy on map lookups
/// for comparing keys from different maps.
class SameMapEquality<K, V> extends MapEquality<K, V> {
SameMapEquality(super.keyEquality, super.valueEquality);
bool equals(Map<K, V> map1, Map<K, V> map2) {
if (map1.length != map2.length) return false;
for (var key in map1.keys) {
var v2 = map2[key];
if (v2 == null && !map2.containsKey(key)) return false;
if (!_valueEquality.equals(map1[key], v2)) return false;
}
return true;
}
}
We could make the implementation of MapEquality
be a MapEquality<Never, Never>
which accepts Map<Object?, Object?>
arguments, or do so only when provided with DefaultEquality
or IdentityEquality
as arguments, because those accept any arguments. (But I'd be worried we might shoot ourselves in the foot if we do this before getting variance annotations, because the equality classes are naturally contravariant, and we could just lean into that and make.)
For DeepCollectionEquality
, the direct version is definitely simpler. It avoids doing is List
and is Set
checks, so it saves a little there, but it's probably not where the big difference is.
We can cache the nested MapEquality(this, this)
, but it will require expandos since the class has a const
constructor.
More reasonably, we could special case the default case (default values for both arguments), and optimize that.
We will still run into nested map equalities with unknown map types, which has to do the full map key/value pair multi-set comparison, but at least we can use efficient {}
maps for that when we know that the equality is ==
.
I think that's the most viable place to start, optimize the default case, if that's what everybody uses anyway.
I created a (naive) benchmark repo for finding a more optimised version for a deep equality of json-like maps (where we assume the maps are of type <String, dynamic>, use the ==
equality for keys and values and contain only nulls, bools, nums, Strings or Lists and Maps thereof as values) here:
https://github.com/knaeckeKami/json_equals
It uses benchmark_harness to compare the performance of different ways to compare two json-like Maps.
It shows that DeepCollectionEquality() has O(n^2) runtime where n is the level of nesting in the map. I suspect that's because of the recursive re-hashing.
Here the results of the benchmark: https://github.com/knaeckeKami/json_equals/blob/main/README.md
You can see that with a map that is nested 9 levels deep, DeepCollectionEquality is ~4000 times slower than a custom implementation, while with 1 level of nesting it's only ~10 times slower.
That quadratic complexity should definitely be fixed, I don't know if we can just cache the hash code (probably not, because we still call equals
on the sub-objects repeatedly, and they don't cache).
Maybe we can cache something else along the way, to avoid repeated computations. Worth checking.
We should perhaps also add a JsonEquals
which is optimized for JSON structures, so it only needs to recognize List
and Map<String,..>
, and it can assume normal string-equality on the keys for the maps.
That seems like a common enough use-case to create an optimized variant for it. E.g.,
/// Deep equality on JSON-like object structures.
///
/// JSON-like objects can be:
/// * Lists of JSON-like objects.
/// * Maps with `String` keys and JSON-like objects as values.
/// * Other objects compared by normal `==`.
///
/// Lists are considered equal if they have the same length and the elements at the same indices are equal.
/// Maps are considered equal if they have the same string keys, and their values for each key are equal.
/// Such maps are assumed to be normal maps using `==` as key equality.
///
/// This equality treat any value which is not a `List<Object?>` or `Map<String, Object?>`
/// as a plain value to be compared using `==`.
/// Normal JSON structures only contain strings, numbers, booleans and `null` as such values.
class JsonEquality implements Equality<Object?> {
const JsonEquality();
bool equals(Object? v1, Object? v2) {
if (v1 == v2) return true;
if (v1 is List) {
if (v2 is! List) return false;
return const ListEquality<Object?>(JsonEquality()).equals(v1, v2);
}
if (v1 is Map<String, Object?>) {
if (v2 is! Map<String, Object?>) return false;
// Assume normal string equality as map key equality.
var length = v1.length;
if (length != v2.length) return false;
for (var entry in v1) {
if (!equals(entry.value, v2[entry.key])) return false;
}
}
return false;
}
int hash(Object? value) {
if (value is List) return const ListEquality(JsonEquality()).hash(value);
if (value is Map<String, Object?>) {
return const MapEquality(DefaultEquality(), JsonEquality()).hash(value);
}
return value.hashCode;
}
bool isValidKey(_) => true;
}
Also, thinking about it more, MapEquality
should perhaps be allowed to assume that the two maps use the same equality, whatever it is.
Because, otherwise, are they really equal maps? Being "equal maps" is not just about having the same keys/value pairs, it could also mean that using one instead of the other is a valid substitution. If not, what is "being equal" actually worth?
I'd be willing to change the definition of MapEquality
to something actually useful, instead of theoretically pleasing. It might end up with false positives, but only when using every key of one map to look up values in the other map gives values that are equal.
The biggest issue with that change is that it makes keyEquality
unnecessary, because it delegates that to the maps. That's an API-change. Probably not one which matters much in practice.
WDYT?
Also, thinking about it more, MapEquality should perhaps be allowed to assume that the two maps use the same equality, whatever it is.
Because, otherwise, are they really equal maps? Being "equal maps" is not just about having the same keys/value pairs, it could also mean that using one instead of the other is a valid substitution. If not, what is "being equal" actually worth?
WDYT?
Makes perfect sense. That's what I was hoping for :)
@lrhn Yeah, I think we should change things to assume that Set
and Map
instances that are being compared have the same equality behaviour and document that as a requirement for the algorithm.
Alternatively, we could think if there is a way for us to add a fast path here, e.g. we could maybe add something like an interface:
abstract class EqualityRule {
bool operator ==(Object other);
}
abstract class ContainerWithEqualityRule {
EqualityRule get equalityRule;
}
have underlying implementation classes for Map
and Set
implement this interface and then do:
if (map1 is ContainerWithEqualityRule &&
map2 is ContainerWithEqualityRule &&
map1.equalityRule == map2.equalityRule) {
// go go fast path gadget
}
but maybe this is way too baroque.
I'd be a little more worried about changing SetEquality
to use contains
, because then it won't be using elementEquality
at all. Well, it will for hashing, which may then differ (more easily) from equality.
For maps, we can check that each key of one map is also a key of the other map, and that the values are equal.
For that to get out of sync with hash
, the second map needs to equate two keys of the first map, which also have equal values in the first map, which makes equals asymmetric, and the hashcodes likely different.
Possible, but not as likely. (I don't want to iterate both maps and do lookups in both directions, because in practice the problem won't happen.)
For sets, the equal-values clause disappears, and then it's much easier to have supposedly equal sets which are actually different (which again means different hash codes).
Still, it's likely that all sets in practice are LinkedHashSet
s with default ==
equality and hashCode
, so anything we do to handle the other cases are wasted effort.
Let's try and see what happens.
For that to get out of sync with hash, the second map needs to equate two keys of the first map, which also have equal values in the first map, which makes equals asymmetric, and the hashcodes likely different.
Even now MapEquality
has if (length != map2.length) return false;
. So for cases where two keys of map a
map to one entry in map b
their length would be different. One could only maintain same length if two keys of a
map to one in b
and two keys in b
map to one in a
. I think this is entirely hypothetical and will never happen in practice.
I'd be a little more worried about changing SetEquality to use contains, because then it won't be using elementEquality at all. Well, it will for hashing, which may then differ (more easily) from equality. For sets, the equal-values clause disappears, and then it's much easier to have supposedly equal sets which are actually different (which again means different hash codes).
Not quite sure I follow here. We already now check that the sets have equal number of elements. One should only consider two sets are equal if a.contains(<some-key>) === b.contains(<some-key>)
- then the sets are practically interchangeable/substitutable. The .contains()
call is internally also doing equality checking of the keys (may or may not use .hashCode
).
This idea that one wants to consider two sets/maps equal (based on elementEquality
) but are in fact not equal when it comes to lookups seems not very useful at all, since it allows for sets to be equal but not interchangeable (i.e. sets are considered equal, but .contains(<some-key>)
in both result different answers). I just don't see the use-case for this.
It would be nice to figure out our path forward with this - I noticed that some internal code also uses MapEquality
in few potentially performance sensitive locations.
@lrhn Would you be interested in looking at this?
I'd love to get back to https://github.com/dart-lang/collection/pull/265/files, but right now I'mtoo busy adding class modifiers, and other 3.0 features, to platform libraries.
@lrhn friendly ping
Any updates on this?
Brace yourself - winter is coming. Would be nice to have a fix on this performance issue under Christmas Eve
There are customers that use
DeepCollectionEquality().equals(a, b)
wherea
andb
are e.g.Map<String, dynamic>
(let's say json). It turns out this is very slow.See https://github.com/zino-hofmann/graphql-flutter/issues/1196 for example use case suffering from this.
A very simple example that demonstrates the difference between a simple custom implementation and
DeepCollectionEquality().equals()
:In JIT:
In AOT:
So
package:collection
is a hopping 70x slower than a manual version.A big reason is that when comparing two maps for equality, the responsible MapEquality.equals is building a new
HashMap
(of equal size). Building it will require inserting_MapEntry(key, value)
as keys. On insertion of each element into theHashMap
it requires (recursively) calculating a hashcode (and possibly recursively doing equals calls):That's a lot of object allocations, repeatedly recursive hashing & equality checking (possibly on same values).
Other things to note:
MapEquality<K, V>
/ ... classes are generic - but in reality it's used withdynamic
. => Using generics requires performingas
-type checks on parameters with covariant types (such asMap<K, V>
) => Especially considering that theDeepCollectionEquality
class ignores the reified generics (it considers e.g.<String, dynamic>{}
and<Object, dynamic>
to be equal)HashMap
over{}
may be slower for many cases.MapEquality
/ ... objects instead of creating new ones for every equality operation/cc @lrhn @mraleph