Should spread of a `Map` use the Iterable protocol or direct iteration?

The current specification for #46 defines spread of a Map in terms of the Iterable protocol. This seems unnecessarily inefficient, as compared to just iterating over the keys (avoiding allocating the MapEntry objects).

Even better would be to call an addAllTo method passing the the receiver map as an argument, thereby avoiding allocating the key iterator as well, but that would be a breaking change to the map API.

cc @lrhn @munificent @rakudrama

I'm less convinced that this is the best option, than I am about spreading iterables. The entries iterator can easily require allocation for each entry (doesn't have to, but likely will).

However, the alternatives are not amazing either:

Iterating the keys means having to do a map[key] lookup afterwards. Whether that's more or less efficient than allocating entry objects is a matter of optimization.
Using forEach means introducing a closure, oldMap.forEach((k, v) { newMap[k] = v; }.
Using map.addAll will do something generic as well (default implementation uses 1.)
Adding an addAllTo method to maps would require passing a map to the method, and exposing the map that is currently being built in an unfinished state..
Add another new method to maps that just allows adding a key/value pair. Effectively equivalent to forEach.
Leave it unspecified how the spread operation gets access to map key/value pairs.

Leaving it unspecified makes the language fragile. A class that implements some Map functionality, but not all, might work on one platform and fail on another. The only user recommendation would be to only ever use platform maps with spreads. I wouldn't want that, so 6. is out.

I considered iterating entries as the best of the other alternatives, with forEach as the next in line. The forEach operation is really the correct approach - let the map iterate itself, then do something for each key/value pair. The only concern is that it requires creating a closure.

On the VM, the entries version does seem to be ~20% slower than forEach, which is slightly faster than keys/addAll. With dart2js/d8, everything is dirt slow, but entries is ~half the speed of forEach, which is again a little faster than keys/addAll. So, from the benchmarks alone, we should be using forEach.

I'm not particularly happy about making language design decisions based on the current speed of implementations. We will be stuck with the specification no matter what we otherwise optimize. If we get tuples in the language, then we might want to change the specification to use a tuple-iterator instead of entries. That might (should) be more efficient, but that's speculation.

Still, I'm willing to change to forEach. The implementations might be able to do something efficient with the closure since they are in complete control of its body. We'd have to make sure that a user forEach that leaks the incoming function cannot use it after the forEach call has completed (we don't want anyone changing our map after it's created). So, ...e becomes something like:

var sourceMap = e;
var tmp = targetMap;
sourceMap.forEach((k, v) {
  tmp[k] = v;
});
tmp = null;   // Prevent later calls from changing `targetMap`.

A quick benchmark (now polymorphic):

import"dart:collection";
void copy1(Map<String, int> from, Map<String, int> to) {
  for (var entry in from.entries) {
    to[entry.key] = entry.value;
  }
}
void copy2(Map<String, int> from, Map<String, int> to) {
  for (var key in from.keys) {
    to[key] = from[key];
  }
}
void copy3(Map<String, int> from, Map<String, int> to) {
  var tmp = to;
  from.forEach((key, value) {
    tmp[key] = value;
  });
  tmp = null;
}
void copy4(Map<String, int> from, Map<String, int> to) {
  to.addAll(from);
}

main() {
  for (int i = 0; i < 5; i++) {
    bench("entries", copy1);
    bench("keys", copy2);
    bench("forEach", copy3);
    bench("addAll", copy4);
  }
}

int id(int x)=>x;
var maps = List.generate(100, (n) {
  var map = Map<String, int>.fromIterable(Iterable.generate(n * 10), key: (n) =>"#$n");
  if (n % 4 == 1) map = SplayTreeMap<String, int>.from(map);
  if (n % 4 == 2) map = HashMap<String, int>.from(map);
  return map;
});

void bench(String name, void Function(Map<String, int>, Map<String, int>) action) {
  var e = 0;
  var c = 0;
  var sw = Stopwatch()..start();
  do {
    for (var from in maps) {
      var to = <String, int>{};
      action(from, to);
      c += from.length;
    }
    e = sw.elapsedMilliseconds;
  } while (e < 2000);
  print("$name: ${c/e} entries/ms");
}

Ok. I'm not deeply dug in on this, but I do think it's a wasted opportunity if we ship something in a form that's unnecessarily slow, and I don't see any user facing reasons to choose one or the other. I'm certainly sympathetic to the difficulty of extrapolating future performance from current performance, but I'm not sure that justifies just throwing up our hands and flipping a coin.

For what it's worth, @rakudrama expressed some preference for using .addAll to allow for possible future double dispatch implementation tricks.

When we decide on the semantics on spreads, it would be nice to leave a tiny bit of wiggle room for improved performance. Granted, we must give developers specific guarantees about what's going on, but we could have situations where we want to do something in a slightly different way.

One example is when a map is created from an existing map, and then adjusted:

Map<K, V> m1 = e; // Implementation may not be know to compilers.
var m2 = {...m1, plus: some, more: stuff};

This might be a quite common situation. The point is that during the phase where the map elements from m1 are added to m2, we know more than we usually do: For each key fromm1, it is guaranteed that it is not already present in m2, and hence we do not need to look it up for the purpose of overwriting rather than adding a new map element.

Different implementations of Map may make this a bigger or smaller optimization, but the point is that we'd just need to iterate over the map elements (m1.forEach would do that), and insertion could use some "secret" version of []= for which it is guaranteed by the compiler that it's a fresh key for m2.

A similar situation arises for sets, as mentioned here: https://github.com/dart-lang/language/issues/208#issuecomment-461319946.

@leafpetersen There is the option of saying, for ...e, executed to collect map entries for a Map<K, V> in a list, that:

Evaluate e to a value o. If o does not implement Map<K2, V2> for some types K2 and V2, it is a run-time error. For each pair of key and value entry of o, accessed in key iteration order through the Map interface members (for example using entries.iterator, keys.iterator and operator [], forEach, or any similar available iteration strategy):

if the run-time type of key is not a subtype of K, it's a type error.

if the run-time type of value is not a subtype of V, it's a type error.

Otherwise add (key, value) to entries.

This is basically the approach I ruled out, because it's impossible to predict which members of the user supplied map are used to do the iteration. Even if we say that, we will use some operation (likely forEach) to do the operation, and then someone might write code that only implements forEach for some utility Map that is only intended to be spread. If we change the behavior later, then we break that code. If we are not ready and willing to make that break, then the implementation is the specification, and we might as well make the specification match what we do.

So, are we ready and willing to break partially implemented maps? (And I want that in writing, in public, so we can point to it if users complain). If so, sure, let's be unspecified. Otherwise, we should specify one approach that users can depend on (and that seems to be forEach).

If we allow any way to iterate, the implementation can just always use addAll, because then anything addAll does will be correct.

@eernstg The "fresh key" is a red herring. It is possible to have a map with two different keys that are equal according to ==. Example:

Map<Duration, int> m1 = Map.identity()..[Duration(seconds: 1)] = 1 ..[Duration(seconds: 1)] = 2;
print(m1.length); // 2
var m2 = [...m1];
print(m2.length); // 1
print(m2[Duration(seconds: 1)]); // 2

Unless you know the internal workings of the map you are spreading, you can't assume anything. If you do know the internal workings, it's a platform map and then we can do anything anyway, including adding an _addTo<K, V>(LinkedHashMap<K, V> target) that works directly on the underlying hash table.

@lrhn wrote

The "fresh key" is a red herring.

Right, it is possible to have a set representation that contains two elements which are (currently) equal, and a map can have two equal keys. That's basically a bug in the realization of those two concepts, because they ought to maintain uniqueness, but we can't avoid it unless we introduce a strong notion of values (immutable entities, with deep equality).

Anyway, it still works with built-in types like integers and strings, and even in that case it might be nice to reserve enough wiggle room to be able to optimize whatever widespread special case we might encounter. (And, as discussed elsewhere, using addAll might actually be quite helpful for that!)

As an opposing view, users are not likely to get their understanding of language features directly from the language specification, so maybe we will get a better understanding if we define the operations in terms of building blocks that users do know.

In this case, we could define list, set and map literals to be equivalent to creating an empty list, set or map, and then perform sequences of add and addAll operations. A single element calls add (or []= for maps). A spread element calls addAll. The control flow operations recursively end up with elements or spreads.

If we do that, then we might be able to generalize literals to other kinds of collections, so you can write:

var queue = Queue()..{1, 2, ... something, if (whatnot) 42};

to initialize a new instance of Queue. (Sadly, it won't work for Uint8List since it has no add method). Or something.

In any case, we would just need to optimizie the addAll of GrowableList, LinkedHashSet and LinkedHashMap to improve performance, and why woudln't we do that anyway.

Using .addAll is just a different way of either:

Making it defined to do whatever .addAll does now forever
Or making it implementation defined

If we do either of those, I'd rather that we just either define it in terms of behavior (since otherwise we're stopping .addAll from changing how it iterates, or explicitly make it implementation defined.

My general take is that using the entries iterator provides no real user benefit, analytically is unlikely to ever provide us with better performance than other approaches, and empirically seems slower.

I'm ok with making it implementation defined, specifying only that we use iteration order, but this generally doesn't seem in line with how we do things in Dart, and de facto it will be a breaking change in the implementations to change how they do it.

I'm ok with using .forEach, but I'm slightly concerned about code size given that closure representations are not cheap in all platforms.

I think overall, my preference is to define this in terms of iteration over .keys. It's the code that I would generally write myself, it empirically (based on your measurements) seems better than using .entries and is close to as good as .forEach.

I wrote a little benchmark to try to get some numbers for what the various protocols that a spread could desugar look like. So far, I've only run this on the VM, but I can try other platforms. Here's the results:

                    addEntries     240.19 spreads/ms
      iterate entries into map     235.81 spreads/ms  0.98x
         iterate keys into map     256.98 spreads/ms  1.07x
              forEach into map     522.40 spreads/ms  2.17x

          just iterate entries     355.70 spreads/ms
             just iterate keys     433.60 spreads/ms
                  just forEach    2171.16 spreads/ms

  (entries insertion overhead)     699.65 spreads/ms
     (keys insertion overhead)     630.87 spreads/ms
  (forEach insertion overhead)     687.93 spreads/ms

The "iterate entries into map" line is what the proposal mandates. That benchmark covers both iterating over the from map and inserting the result into the to map. Arguably, the latter isn't interesting because any proposal has to do that somehow. So the "just" lines do the work of iterating over the from map but then don't actually insert into another map. In theory, the differences between the "just" rows are more important since that shows the relative difference of each strategy ignoring the fixed overhead of inserting into the new map. The "overhead" lines show the difference, which should be the time just to insert the entries.

The "addEntries" one is a little special. It calls to.addEntries(from) directly, which means we can't split out the iterating from the inserting.

It does look like the overhead is about constant, which is a good sanity check. From the numbers it seems like forEach() is the big winner. Iterating over the keys is a little faster, but not much.

But... it depends on how you slice the data. This benchmark is deliberately polymorphic and spreads LinkedHashMaps, HashMaps, and SplayTreeMaps. If I split that out into three separate monomorphic benchmarks:

SplayTreeMap:

                    addEntries     134.04 spreads/ms
      iterate entries into map     132.28 spreads/ms  0.99x
         iterate keys into map     139.46 spreads/ms  1.04x
              forEach into map     434.28 spreads/ms  3.24x

          just iterate entries     167.20 spreads/ms
             just iterate keys     177.24 spreads/ms
                  just forEach    1325.38 spreads/ms

  (entries insertion overhead)     633.34 spreads/ms
     (keys insertion overhead)     654.26 spreads/ms
  (forEach insertion overhead)     645.92 spreads/ms

LinkedHashMap:

                    addEntries     428.99 spreads/ms
      iterate entries into map     445.20 spreads/ms  1.04x
         iterate keys into map     546.54 spreads/ms  1.27x
              forEach into map     573.46 spreads/ms  1.34x

          just iterate entries    1220.35 spreads/ms
             just iterate keys    2134.47 spreads/ms
                  just forEach    3107.40 spreads/ms

  (entries insertion overhead)     700.89 spreads/ms
     (keys insertion overhead)     734.65 spreads/ms
  (forEach insertion overhead)     703.23 spreads/ms

HashMap:

                    addEntries     445.36 spreads/ms
      iterate entries into map     428.71 spreads/ms  0.96x
         iterate keys into map     500.67 spreads/ms  1.12x
              forEach into map     589.28 spreads/ms  1.32x

          just iterate entries    1130.50 spreads/ms
             just iterate keys    2168.90 spreads/ms
                  just forEach    3840.61 spreads/ms

  (entries insertion overhead)     690.61 spreads/ms
     (keys insertion overhead)     650.94 spreads/ms
  (forEach insertion overhead)     696.08 spreads/ms

Then it looks like the three classes have very different performance profiles for their APIs. forEach() is really fast on SplayTreeMap compared to iterating over the keys or entries. But for the other classes, it doesn't make that much of a difference. This isn't too surprising if you look at the implementation of SplayTreeMap.

Note also that this benchmark uses a mixture of map sizes from 0 up to 100. If we leaned towards certain sizes, that may also skew the numbers.

It does seem like .entries has little going for it. If nothing else, I think .keys and [] is both a little faster and more idiomatic. forEach() isn't as much of a consistent win as it first appeared, especially given how rarely used SplayTreeMap is in practice.

One way to think about this is that we should decide if we think internal or external iteration is the right protocol. Internal iteration is "easier" for the iteratee. It's just given a closure and it has total control over how it walks itself and invokes the closure. It doesn't need to return control back to the thing doing the spread until all of the iteration is done. That means it can effectively use the top of the callstack as its own local state storage.

With any kind of external iteration, the iteratee has to return to the iterator. That means in order to pick up where it left off, that state has be reified (and thus usually allocated in Dart) somewhere. In return for this, the iterator gets more flexibility. It can choose to stop iterating, resume later, etc.

In the case of spread, we don't need any of that flexibility. Every spread always wants to iterate over the spreadee exactly once, to completion. So there's an argument is that internal iteration is the principled choice.

That implies using forEach() for maps and probably also for Iterables being spread into lists and sets. On the other issue (https://github.com/dart-lang/language/issues/208), my benchmark shows that forEach() is faster for Iterables, except on the VM. Maybe there's some overhead for calling a closure. Note that unlike the other options on that issue, we could do this for all Iterables being spread, not just those that implement List.

Cases where the thing you are spreading is a List are interesting. In that special case, the only state you need for an external iterator is an integer, so it tends to be quite fast.

On the other hand, the for-in statement always uses external iteration, so making the for element do a different protocol is maybe unsymmetric. If we ever wanted to later add support for break in for elements or anything else that would cause us to take control away from the spreadee, we will hate ourselves for choosing internal iteration.

Thoughts?

I'm leaning towards internal iteration for spreads, for the reasons mentioned. The collection being spread is the only one who can truly know how best to iterate it. Anything we specify will be sub-optional in some cases, unless we specify nothing (and then pick internal iteration as implementation).

That said, we have iteration in the language with sync* and for-in. Any time we see a combination of those, we should be able to inline very eagerly and get zero-overhead effectively-internal iteration. The big problem with that is that nobody uses sync* for writing their collection's iterator, and we can't statically detect that it happens. That is sad.

An issue with internal iteration is that the only abstraction for executable code we have is closures, so we have to allocate a closure when using forEach. We also get a number of polymorphic closure invocations, and we leak this function to user code, which may choose to keep it alive, so we have to program defensively. Maybe the platforms can do something cheaper than a full closure for these things. Maybe not.

So, in summary:

We can go for consistency: Use iterable iteration everywhere.
We can go for speed: Use forEach for maps, or leave it unspecified, because forEach might not always be the fastest.
We can go for generality: Use addAll because then it might generalize to user collections too. That's speculative.
We can cheat: Specify iterable iteration, then recognize platform types and do something else for those.

Can we live with breaking user code if we change an unspecified behavior? If not, then we need to specify it in the smallest details (and likely cheat). Then I'd just use iteration, for consistency. If we say that a list must be iterated using index operations, we increase the code size for every iteration of an Iterable that might be a List (if we want to inline) or at least we introduce an extra type check. I don't like that.

In practice, all lists will be platform lists, so just silently special-casing those will likely be enough.

Here's numbers for the other main platforms. dart2js is crushingly slow if you don't do -O4. I believe most deployed apps use that setting too, so doing that here:

dart2js -O4, all three map types

                    addEntries      88.07 spreads/ms
      iterate entries into map      87.59 spreads/ms  0.99x
         iterate keys into map      89.72 spreads/ms  1.02x
              forEach into map     208.70 spreads/ms  2.37x

          just iterate entries     151.42 spreads/ms
             just iterate keys     158.42 spreads/ms
                  just forEach    1920.00 spreads/ms

dart2js -O4, just LinkedHashMap

                    addEntries     186.05 spreads/ms
      iterate entries into map     200.00 spreads/ms  1.07x
         iterate keys into map     210.53 spreads/ms  1.13x
              forEach into map     216.22 spreads/ms  1.16x

          just iterate entries    2000.00 spreads/ms
             just iterate keys    2666.67 spreads/ms
                  just forEach    8000.00 spreads/ms

dart2js -O4, just SplayTreeMap

                    addEntries      45.07 spreads/ms
      iterate entries into map      47.90 spreads/ms  1.06x
         iterate keys into map      48.63 spreads/ms  1.08x
              forEach into map     190.48 spreads/ms  4.23x

          just iterate entries      58.39 spreads/ms
             just iterate keys      61.07 spreads/ms
                  just forEach    1777.78 spreads/ms

dart2js -O4, just HashMap

                    addEntries     195.12 spreads/ms
      iterate entries into map     190.48 spreads/ms  0.98x
         iterate keys into map     213.33 spreads/ms  1.09x
              forEach into map     200.00 spreads/ms  1.02x

          just iterate entries    2000.00 spreads/ms
             just iterate keys    2666.67 spreads/ms
                  just forEach    1777.78 spreads/ms

AoT

AoT, all three map types

                    addEntries     177.29 spreads/ms
      iterate entries into map     174.85 spreads/ms  0.99x
         iterate keys into map     193.09 spreads/ms  1.09x
              forEach into map     452.55 spreads/ms  2.55x

          just iterate entries     253.42 spreads/ms
             just iterate keys     286.82 spreads/ms
                  just forEach    1665.05 spreads/ms

AoT, just LinkedHashMap

                    addEntries     367.50 spreads/ms
      iterate entries into map     359.07 spreads/ms  0.98x
         iterate keys into map     428.72 spreads/ms  1.17x
              forEach into map     455.08 spreads/ms  1.24x

          just iterate entries     839.98 spreads/ms
             just iterate keys    1344.65 spreads/ms
                  just forEach    1841.41 spreads/ms

AoT, just SplayTreeMap

                    addEntries      93.36 spreads/ms
      iterate entries into map      86.44 spreads/ms  0.93x
         iterate keys into map      96.69 spreads/ms  1.04x
              forEach into map     387.36 spreads/ms  4.15x

          just iterate entries     111.62 spreads/ms
             just iterate keys     120.15 spreads/ms
                  just forEach    1132.74 spreads/ms

AoT, just HashMap

                    addEntries     356.83 spreads/ms
      iterate entries into map     340.96 spreads/ms  0.96x
         iterate keys into map     430.36 spreads/ms  1.21x
              forEach into map     513.33 spreads/ms  1.44x

          just iterate entries     811.36 spreads/ms
             just iterate keys    1479.02 spreads/ms
                  just forEach    3797.77 spreads/ms

I'm going to go through the list benchmark too and try to consolidate this into some single set of numbers that are hopefully actionable.

dart2js is crushingly slow if you don't do -O4. I believe most deployed apps use that setting too.

I'm not sure if flutter webs apps will be able to use -O4 due to our usage of Type and runtimeTime objects. @yjbanov have you experimented with any of the dart2js optimization levels?

@munificent Have you looked at memory usage difference between various methods? I'd be curious what objects end up allocated on the heap during iteration. In forEach case we allocate at least a closure. Others seem to need at least an Iterator. However, if the compiler knows the exact type of the map it may be able to iterate without allocating anything at all.

@jonahwilliams I want to say we use -O4 in our prototype already, but I'll have to double-check.

@munificent Have you looked at memory usage difference between various methods?

I haven't, but I don't think memory usage should differ that much. In general, we expect most spreads will be fairly small. On large spreads, you may churn through a bunch of MapEntry objects, but modern GCs handle short-lived objects pretty well.

I did some more benchmarking and investigation and wrote up my thoughts. We discussed this more in the language meeting. The decision is to keep specifying this in terms of entries.

As always, if an implementation detects that the object being spread is a known built-in type (which is almost always the case for maps), it's free to do some lower-level access to the map entries since it knows that there won't be any user-visible side effects.

dart-lang / language