VM LinkedHashMap has inefficient `keys.contains`, likely others.

dart-lang / sdk

The Dart SDK, including the VM, JS and Wasm compilers, analysis, core libraries, and more.

https://dart.dev

BSD 3-Clause "New" or "Revised" License

10.25k stars 1.58k forks source link

VM LinkedHashMap has inefficient `keys.contains`, likely others. #52909

Open lrhn opened 1 year ago

lrhn commented 1 year ago

The Map protocol specifies that map.keys.contains should be as efficient as map.containsKey.

The current VM implementation performs a linear lookup on keys.contains, using the default Iterable implementation of contains.

See: https://dartpad.dev/?id=180ce0b005f5345f9285147c222e5032 for a benchmark comparing Map.containsKey to Map.keys.contains. Run in DartPad, dart2js shows results like:

map.containsKey: 98180.899 /ms (262143000 ops / 2670 ms)
map.keys.contains: 97705.181 /ms (262143000 ops / 2683 ms)

(with a variance bigger than the difference, so pretty much the same performance.)

The VM shows results like:

map.containsKey: 92076.923 /ms (262143000 ops / 2847 ms)
map.keys.contains: 15.294 /ms (31000 ops / 2027 ms)

Just a factor 6000 off. For a ~10000 element linear lookup, that's in the expected ballpark.

This is fairly easy to fix, it just needs to forward to containsKey. I'll likely do it myself.

I'll check if there are other missing efficient implementations.

Likely last can be wastly more efficient, but it isn't promised to be efficient to begin with. Still, some users may assume that it is efficient.
The entries iterable uses the default implementation, which iterates keys, then does lookup on the key to get the value. That can be more efficient when the key/value is known to be adjacent in the data list.

iamsahilsonawane commented 1 year ago

Hi @lrhn,

I'm curious about your discovery that the map.keys.contains method is much less efficient than the containsKey method. I understand that the map.keys method returns an Iterable, and that the containsKey method takes advantage of the fact that the Map knows how to quickly lookup keys.

My question is, would it be a good idea to change the implementation of the Iterable contains method to forward to the containsKey method? Because in any case, it is just an Iterable, and according to what I think, the implementation for the contains method will be changed for this case if somehow overridden, would that be a good option?

I'm curious to know your thoughts on this. Thanks!

lrhn commented 1 year ago

Yes, making thekeys iterable's contains forward to the map's containsKey is the most likely fix here.

iamsahilsonawane commented 1 year ago

I understand, but wouldn't that hinder the functionality of the Iterable itself?

Again just curious

iamsahilsonawane commented 1 year ago

I mean as a user, when I am accessing the iterable keys, I am for sure aware of it being an iterable and not map and consider using containsKey instead if required?

rakudrama commented 1 year ago

A better iterator for entries would definitely be beneficial. This should show up in our benchmarks.

I would like us to be heading towards a place where something like for (final (k, v) in map.pairs) has no per-entry allocations.

An intermediate step might be for for (final MapEntry(:key, :value) in map.entries) to allocate a MapEntry in get:current, so that could be inlined into the destructuring. That would require playing loose with identity of MapEntry objects and making out-of-iteration calls to get:current throw instead of returning null in unsound mode.

map.keys.first can be slow in some rare situations for the reasons set.first can be slow (https://github.com/dart-lang/sdk/issues/31470), but there is no reason map.{keys,values,entries}.{first,last} should be worse than set.{first,last}. Just be advised that it will be harder to fix #31470 if the tombstone scanning loop is cloned by hand into specialized iterators and first/last methods.

lrhn commented 1 year ago

I think remove now does cause compaction at 50% tombstones, but you can probably still keep first being O(n/2). We could optimize and cache the first possible non-tombstone, a "usedStart" like we have "usedSize" to remember the last place we need to look in the backing array. Whenever we iterate, and the "usedStart" is a tombstone, we remember the new first element. Heck, we can make the data array be used as a cyclic buffer, so if you keep doing set.add(set.removeFirst()), you can stay in the same array, without needing to grow it.

a-siva commented 3 months ago

@lrhn are you still considering fixing it? Does it need to be 'area-vm' issue or 'area-library' and is P2 the correct priority for it.

lrhn commented 3 months ago

Let's assume I won't be the one fixing it, since I haven't done so yet

rakudrama commented 1 week ago

Not only is the keys.contains inefficient, it is incorrect. It must use Map.containsKey, which can use a different equality (e.g, identity and custom maps).