Closed leafpetersen closed 5 years ago
I'm less convinced that this is the best option, than I am about spreading iterables.
The entries
iterator can easily require allocation for each entry (doesn't have to, but likely will).
However, the alternatives are not amazing either:
keys
means having to do a map[key]
lookup afterwards. Whether that's more or less efficient than allocating entry objects is a matter of optimization.forEach
means introducing a closure, oldMap.forEach((k, v) { newMap[k] = v; }
.map.addAll
will do something generic as well (default implementation uses 1.)addAllTo
method to maps would require passing a map to the method, and exposing the map that is currently being built in an unfinished state..forEach
.Leaving it unspecified makes the language fragile. A class that implements some Map
functionality, but not all, might work on one platform and fail on another. The only user recommendation would be to only ever use platform maps with spreads. I wouldn't want that, so 6. is out.
I considered iterating entries as the best of the other alternatives, with forEach
as the next in line.
The forEach
operation is really the correct approach - let the map iterate itself, then do something for each key/value pair. The only concern is that it requires creating a closure.
On the VM, the entries
version does seem to be ~20% slower than forEach
, which is slightly faster than keys
/addAll
.
With dart2js/d8, everything is dirt slow, but entries
is ~half the speed of forEach
, which is again a little faster than keys
/addAll
.
So, from the benchmarks alone, we should be using forEach
.
I'm not particularly happy about making language design decisions based on the current speed of implementations. We will be stuck with the specification no matter what we otherwise optimize.
If we get tuples in the language, then we might want to change the specification to use a tuple-iterator instead of entries
. That might (should) be more efficient, but that's speculation.
Still, I'm willing to change to forEach
. The implementations might be able to do something efficient with the closure since they are in complete control of its body.
We'd have to make sure that a user forEach
that leaks the incoming function cannot use it after the forEach
call has completed (we don't want anyone changing our map after it's created).
So, ...e
becomes something like:
var sourceMap = e;
var tmp = targetMap;
sourceMap.forEach((k, v) {
tmp[k] = v;
});
tmp = null; // Prevent later calls from changing `targetMap`.
A quick benchmark (now polymorphic):
import"dart:collection";
void copy1(Map<String, int> from, Map<String, int> to) {
for (var entry in from.entries) {
to[entry.key] = entry.value;
}
}
void copy2(Map<String, int> from, Map<String, int> to) {
for (var key in from.keys) {
to[key] = from[key];
}
}
void copy3(Map<String, int> from, Map<String, int> to) {
var tmp = to;
from.forEach((key, value) {
tmp[key] = value;
});
tmp = null;
}
void copy4(Map<String, int> from, Map<String, int> to) {
to.addAll(from);
}
main() {
for (int i = 0; i < 5; i++) {
bench("entries", copy1);
bench("keys", copy2);
bench("forEach", copy3);
bench("addAll", copy4);
}
}
int id(int x)=>x;
var maps = List.generate(100, (n) {
var map = Map<String, int>.fromIterable(Iterable.generate(n * 10), key: (n) =>"#$n");
if (n % 4 == 1) map = SplayTreeMap<String, int>.from(map);
if (n % 4 == 2) map = HashMap<String, int>.from(map);
return map;
});
void bench(String name, void Function(Map<String, int>, Map<String, int>) action) {
var e = 0;
var c = 0;
var sw = Stopwatch()..start();
do {
for (var from in maps) {
var to = <String, int>{};
action(from, to);
c += from.length;
}
e = sw.elapsedMilliseconds;
} while (e < 2000);
print("$name: ${c/e} entries/ms");
}
Ok. I'm not deeply dug in on this, but I do think it's a wasted opportunity if we ship something in a form that's unnecessarily slow, and I don't see any user facing reasons to choose one or the other. I'm certainly sympathetic to the difficulty of extrapolating future performance from current performance, but I'm not sure that justifies just throwing up our hands and flipping a coin.
For what it's worth, @rakudrama expressed some preference for using .addAll
to allow for possible future double dispatch implementation tricks.
When we decide on the semantics on spreads, it would be nice to leave a tiny bit of wiggle room for improved performance. Granted, we must give developers specific guarantees about what's going on, but we could have situations where we want to do something in a slightly different way.
One example is when a map is created from an existing map, and then adjusted:
Map<K, V> m1 = e; // Implementation may not be know to compilers.
var m2 = {...m1, plus: some, more: stuff};
This might be a quite common situation. The point is that during the phase where the map elements from m1
are added to m2
, we know more than we usually do: For each key fromm1
, it is guaranteed that it is not already present in m2
, and hence we do not need to look it up for the purpose of overwriting rather than adding a new map element.
Different implementations of Map
may make this a bigger or smaller optimization, but the point is that we'd just need to iterate over the map elements (m1.forEach
would do that), and insertion could use some "secret" version of []=
for which it is guaranteed by the compiler that it's a fresh key for m2
.
A similar situation arises for sets, as mentioned here: https://github.com/dart-lang/language/issues/208#issuecomment-461319946.
@leafpetersen
There is the option of saying, for ...e
, executed to collect map entries for a Map<K, V>
in a list, that:
Evaluate e to a value o. If
o
does not implementMap<K2, V2>
for some typesK2
andV2
, it is a run-time error. For each pair of key and value entry ofo
, accessed in key iteration order through theMap
interface members (for example usingentries.iterator
,keys.iterator
and operator[]
,forEach
, or any similar available iteration strategy):
- if the run-time type of key is not a subtype of K, it's a type error.
- if the run-time type of value is not a subtype of V, it's a type error.
- Otherwise add (key, value) to entries.
This is basically the approach I ruled out, because it's impossible to predict which members of the user supplied map are used to do the iteration. Even if we say that, we will use some operation (likely forEach
) to do the operation, and then someone might write code that only implements forEach
for some utility Map
that is only intended to be spread. If we change the behavior later, then we break that code. If we are not ready and willing to make that break, then the implementation is the specification, and we might as well make the specification match what we do.
So, are we ready and willing to break partially implemented maps? (And I want that in writing, in public, so we can point to it if users complain). If so, sure, let's be unspecified. Otherwise, we should specify one approach that users can depend on (and that seems to be forEach
).
If we allow any way to iterate, the implementation can just always use addAll
, because then anything addAll
does will be correct.
@eernstg
The "fresh key" is a red herring. It is possible to have a map with two different keys that are equal according to ==
.
Example:
Map<Duration, int> m1 = Map.identity()..[Duration(seconds: 1)] = 1 ..[Duration(seconds: 1)] = 2;
print(m1.length); // 2
var m2 = [...m1];
print(m2.length); // 1
print(m2[Duration(seconds: 1)]); // 2
Unless you know the internal workings of the map you are spreading, you can't assume anything.
If you do know the internal workings, it's a platform map and then we can do anything anyway, including adding an _addTo<K, V>(LinkedHashMap<K, V> target)
that works directly on the underlying hash table.
@lrhn wrote
The "fresh key" is a red herring.
Right, it is possible to have a set representation that contains two elements which are (currently) equal, and a map can have two equal keys. That's basically a bug in the realization of those two concepts, because they ought to maintain uniqueness, but we can't avoid it unless we introduce a strong notion of values (immutable entities, with deep equality).
Anyway, it still works with built-in types like integers and strings, and even in that case it might be nice to reserve enough wiggle room to be able to optimize whatever widespread special case we might encounter. (And, as discussed elsewhere, using addAll
might actually be quite helpful for that!)
As an opposing view, users are not likely to get their understanding of language features directly from the language specification, so maybe we will get a better understanding if we define the operations in terms of building blocks that users do know.
In this case, we could define list, set and map literals to be equivalent to creating an empty list, set or map, and then perform sequences of add
and addAll
operations. A single element calls add
(or []=
for maps). A spread element calls addAll
. The control flow operations recursively end up with elements or spreads.
If we do that, then we might be able to generalize literals to other kinds of collections, so you can write:
var queue = Queue()..{1, 2, ... something, if (whatnot) 42};
to initialize a new instance of Queue
.
(Sadly, it won't work for Uint8List
since it has no add
method).
Or something.
In any case, we would just need to optimizie the addAll
of GrowableList
, LinkedHashSet
and LinkedHashMap
to improve performance, and why woudln't we do that anyway.
Using .addAll
is just a different way of either:
.addAll
does now foreverIf we do either of those, I'd rather that we just either define it in terms of behavior (since otherwise we're stopping .addAll
from changing how it iterates, or explicitly make it implementation defined.
My general take is that using the entries iterator provides no real user benefit, analytically is unlikely to ever provide us with better performance than other approaches, and empirically seems slower.
I'm ok with making it implementation defined, specifying only that we use iteration order, but this generally doesn't seem in line with how we do things in Dart, and de facto it will be a breaking change in the implementations to change how they do it.
I'm ok with using .forEach
, but I'm slightly concerned about code size given that closure representations are not cheap in all platforms.
I think overall, my preference is to define this in terms of iteration over .keys
. It's the code that I would generally write myself, it empirically (based on your measurements) seems better than using .entries
and is close to as good as .forEach
.
I wrote a little benchmark to try to get some numbers for what the various protocols that a spread could desugar look like. So far, I've only run this on the VM, but I can try other platforms. Here's the results:
addEntries 240.19 spreads/ms
iterate entries into map 235.81 spreads/ms 0.98x
iterate keys into map 256.98 spreads/ms 1.07x
forEach into map 522.40 spreads/ms 2.17x
just iterate entries 355.70 spreads/ms
just iterate keys 433.60 spreads/ms
just forEach 2171.16 spreads/ms
(entries insertion overhead) 699.65 spreads/ms
(keys insertion overhead) 630.87 spreads/ms
(forEach insertion overhead) 687.93 spreads/ms
The "iterate entries into map" line is what the proposal mandates. That benchmark covers both iterating over the from map and inserting the result into the to map. Arguably, the latter isn't interesting because any proposal has to do that somehow. So the "just" lines do the work of iterating over the from map but then don't actually insert into another map. In theory, the differences between the "just" rows are more important since that shows the relative difference of each strategy ignoring the fixed overhead of inserting into the new map. The "overhead" lines show the difference, which should be the time just to insert the entries.
The "addEntries" one is a little special. It calls to.addEntries(from)
directly, which means we can't split out the iterating from the inserting.
It does look like the overhead is about constant, which is a good sanity check. From the numbers it seems like forEach() is the big winner. Iterating over the keys is a little faster, but not much.
But... it depends on how you slice the data. This benchmark is deliberately polymorphic and spreads LinkedHashMaps, HashMaps, and SplayTreeMaps. If I split that out into three separate monomorphic benchmarks:
SplayTreeMap:
addEntries 134.04 spreads/ms
iterate entries into map 132.28 spreads/ms 0.99x
iterate keys into map 139.46 spreads/ms 1.04x
forEach into map 434.28 spreads/ms 3.24x
just iterate entries 167.20 spreads/ms
just iterate keys 177.24 spreads/ms
just forEach 1325.38 spreads/ms
(entries insertion overhead) 633.34 spreads/ms
(keys insertion overhead) 654.26 spreads/ms
(forEach insertion overhead) 645.92 spreads/ms
LinkedHashMap:
addEntries 428.99 spreads/ms
iterate entries into map 445.20 spreads/ms 1.04x
iterate keys into map 546.54 spreads/ms 1.27x
forEach into map 573.46 spreads/ms 1.34x
just iterate entries 1220.35 spreads/ms
just iterate keys 2134.47 spreads/ms
just forEach 3107.40 spreads/ms
(entries insertion overhead) 700.89 spreads/ms
(keys insertion overhead) 734.65 spreads/ms
(forEach insertion overhead) 703.23 spreads/ms
HashMap:
addEntries 445.36 spreads/ms
iterate entries into map 428.71 spreads/ms 0.96x
iterate keys into map 500.67 spreads/ms 1.12x
forEach into map 589.28 spreads/ms 1.32x
just iterate entries 1130.50 spreads/ms
just iterate keys 2168.90 spreads/ms
just forEach 3840.61 spreads/ms
(entries insertion overhead) 690.61 spreads/ms
(keys insertion overhead) 650.94 spreads/ms
(forEach insertion overhead) 696.08 spreads/ms
Then it looks like the three classes have very different performance profiles for their APIs. forEach() is really fast on SplayTreeMap compared to iterating over the keys or entries. But for the other classes, it doesn't make that much of a difference. This isn't too surprising if you look at the implementation of SplayTreeMap.
Note also that this benchmark uses a mixture of map sizes from 0 up to 100. If we leaned towards certain sizes, that may also skew the numbers.
It does seem like .entries
has little going for it. If nothing else, I think .keys
and []
is both a little faster and more idiomatic. forEach()
isn't as much of a consistent win as it first appeared, especially given how rarely used SplayTreeMap is in practice.
One way to think about this is that we should decide if we think internal or external iteration is the right protocol. Internal iteration is "easier" for the iteratee. It's just given a closure and it has total control over how it walks itself and invokes the closure. It doesn't need to return control back to the thing doing the spread until all of the iteration is done. That means it can effectively use the top of the callstack as its own local state storage.
With any kind of external iteration, the iteratee has to return to the iterator. That means in order to pick up where it left off, that state has be reified (and thus usually allocated in Dart) somewhere. In return for this, the iterator gets more flexibility. It can choose to stop iterating, resume later, etc.
In the case of spread, we don't need any of that flexibility. Every spread always wants to iterate over the spreadee exactly once, to completion. So there's an argument is that internal iteration is the principled choice.
That implies using forEach()
for maps and probably also for Iterables being spread into lists and sets. On the other issue (https://github.com/dart-lang/language/issues/208), my benchmark shows that forEach()
is faster for Iterables, except on the VM. Maybe there's some overhead for calling a closure. Note that unlike the other options on that issue, we could do this for all Iterables being spread, not just those that implement List
Cases where the thing you are spreading is a List are interesting. In that special case, the only state you need for an external iterator is an integer, so it tends to be quite fast.
On the other hand, the for-in
statement always uses external iteration, so making the for
element do a different protocol is maybe unsymmetric. If we ever wanted to later add support for break
in for
elements or anything else that would cause us to take control away from the spreadee, we will hate ourselves for choosing internal iteration.
Thoughts?
I'm leaning towards internal iteration for spreads, for the reasons mentioned. The collection being spread is the only one who can truly know how best to iterate it. Anything we specify will be sub-optional in some cases, unless we specify nothing (and then pick internal iteration as implementation).
That said, we have iteration in the language with sync*
and for-in
. Any time we see a combination of those, we should be able to inline very eagerly and get zero-overhead effectively-internal iteration.
The big problem with that is that nobody uses sync*
for writing their collection's iterator
, and we can't statically detect that it happens. That is sad.
An issue with internal iteration is that the only abstraction for executable code we have is closures, so we have to allocate a closure when using forEach
. We also get a number of polymorphic closure invocations, and we leak this function to user code, which may choose to keep it alive, so we have to program defensively.
Maybe the platforms can do something cheaper than a full closure for these things. Maybe not.
So, in summary:
forEach
for maps, or leave it unspecified, because forEach
might not always be the fastest.addAll
because then it might generalize to user collections too. That's speculative.Can we live with breaking user code if we change an unspecified behavior?
If not, then we need to specify it in the smallest details (and likely cheat). Then I'd just use iteration, for consistency. If we say that a list must be iterated using index operations, we increase the code size for every iteration of an Iterable
that might be a List
(if we want to inline) or at least we introduce an extra type check. I don't like that.
In practice, all lists will be platform lists, so just silently special-casing those will likely be enough.
Here's numbers for the other main platforms. dart2js is crushingly slow if you don't do -O4
. I believe most deployed apps use that setting too, so doing that here:
addEntries 88.07 spreads/ms
iterate entries into map 87.59 spreads/ms 0.99x
iterate keys into map 89.72 spreads/ms 1.02x
forEach into map 208.70 spreads/ms 2.37x
just iterate entries 151.42 spreads/ms
just iterate keys 158.42 spreads/ms
just forEach 1920.00 spreads/ms
addEntries 186.05 spreads/ms
iterate entries into map 200.00 spreads/ms 1.07x
iterate keys into map 210.53 spreads/ms 1.13x
forEach into map 216.22 spreads/ms 1.16x
just iterate entries 2000.00 spreads/ms
just iterate keys 2666.67 spreads/ms
just forEach 8000.00 spreads/ms
addEntries 45.07 spreads/ms
iterate entries into map 47.90 spreads/ms 1.06x
iterate keys into map 48.63 spreads/ms 1.08x
forEach into map 190.48 spreads/ms 4.23x
just iterate entries 58.39 spreads/ms
just iterate keys 61.07 spreads/ms
just forEach 1777.78 spreads/ms
addEntries 195.12 spreads/ms
iterate entries into map 190.48 spreads/ms 0.98x
iterate keys into map 213.33 spreads/ms 1.09x
forEach into map 200.00 spreads/ms 1.02x
just iterate entries 2000.00 spreads/ms
just iterate keys 2666.67 spreads/ms
just forEach 1777.78 spreads/ms
addEntries 177.29 spreads/ms
iterate entries into map 174.85 spreads/ms 0.99x
iterate keys into map 193.09 spreads/ms 1.09x
forEach into map 452.55 spreads/ms 2.55x
just iterate entries 253.42 spreads/ms
just iterate keys 286.82 spreads/ms
just forEach 1665.05 spreads/ms
addEntries 367.50 spreads/ms
iterate entries into map 359.07 spreads/ms 0.98x
iterate keys into map 428.72 spreads/ms 1.17x
forEach into map 455.08 spreads/ms 1.24x
just iterate entries 839.98 spreads/ms
just iterate keys 1344.65 spreads/ms
just forEach 1841.41 spreads/ms
addEntries 93.36 spreads/ms
iterate entries into map 86.44 spreads/ms 0.93x
iterate keys into map 96.69 spreads/ms 1.04x
forEach into map 387.36 spreads/ms 4.15x
just iterate entries 111.62 spreads/ms
just iterate keys 120.15 spreads/ms
just forEach 1132.74 spreads/ms
addEntries 356.83 spreads/ms
iterate entries into map 340.96 spreads/ms 0.96x
iterate keys into map 430.36 spreads/ms 1.21x
forEach into map 513.33 spreads/ms 1.44x
just iterate entries 811.36 spreads/ms
just iterate keys 1479.02 spreads/ms
just forEach 3797.77 spreads/ms
I'm going to go through the list benchmark too and try to consolidate this into some single set of numbers that are hopefully actionable.
dart2js is crushingly slow if you don't do
-O4
. I believe most deployed apps use that setting too.
I'm not sure if flutter webs apps will be able to use -O4
due to our usage of Type and runtimeTime objects. @yjbanov have you experimented with any of the dart2js optimization levels?
@munificent Have you looked at memory usage difference between various methods? I'd be curious what objects end up allocated on the heap during iteration. In forEach
case we allocate at least a closure. Others seem to need at least an Iterator
. However, if the compiler knows the exact type of the map it may be able to iterate without allocating anything at all.
@jonahwilliams I want to say we use -O4
in our prototype already, but I'll have to double-check.
@munificent Have you looked at memory usage difference between various methods?
I haven't, but I don't think memory usage should differ that much. In general, we expect most spreads will be fairly small. On large spreads, you may churn through a bunch of MapEntry objects, but modern GCs handle short-lived objects pretty well.
I did some more benchmarking and investigation and wrote up my thoughts. We discussed this more in the language meeting. The decision is to keep specifying this in terms of entries
.
As always, if an implementation detects that the object being spread is a known built-in type (which is almost always the case for maps), it's free to do some lower-level access to the map entries since it knows that there won't be any user-visible side effects.
The current specification for #46 defines spread of a Map in terms of the Iterable protocol. This seems unnecessarily inefficient, as compared to just iterating over the keys (avoiding allocating the
MapEntry
objects).Even better would be to call an
addAllTo
method passing the the receiver map as an argument, thereby avoiding allocating the key iterator as well, but that would be a breaking change to the map API.cc @lrhn @munificent @rakudrama