Persistent collections updates (part 4)

lorentey commented 1 year ago

This is the first in a series of subsequent PRs with large-scale logic changes. This first one concentrates on the representation of node storage, and the patterns of accessing it:

Switch to using ManagedBuffer to represent node storage. This has useful performance benefits, but it significantly complicates access to node data. We no longer have direct access to the underlying raw storage: everything needs to go through closures passed to ManagedBuffer.withUnsafeMutablePointers.
Introduce struct _Node and struct _Node.UnsafeHandle to represent a (strong) reference to a hash tree node, and a convenient handle for node storage, respectively. Use these instead of directly accessing the ManagedBuffer object, whenever feasible. (_Node holds the implementation of all the actual tree operations -- it provides mutating methods to manipulate a (sub)tree, providing a workable illusion of value semantics on the level of individual subtrees. UnsafeHandle provides both read-only and mutable views into storage inside a node -- mutating operations require (and assume) that the corresponding _Node holds a uniquely held reference to the storage object.)
Implement separate paths for in-place mutations vs copy-on-write operation, optimizing both flavors.
Go to great lengths to eliminate unnecessary retain-release traffic, whenever possible.
Push subtree counts into parent nodes by moving the corresponding field from the node header into the _Node struct. (This results in a cool ~20x speedup for jumping around in the tree by integer offsets.)
Implement a type-punned empty root singleton, to allow creating empty trees without allocating anything.

The implementation of dictionary operations remain conceptually unchanged for now (although the details look superficially different now) -- future updates will start tweaking things in earnest.

Checklist

[X] I've read the Contribution Guidelines
[X] My contributions are licensed under the Swift license.
[X] I've followed the coding style of the rest of the project.
[ ] I've added tests covering all new code paths my change adds to the project (if appropriate).
[ ] I've added benchmarks covering new functionality (if appropriate).
[X] I've verified that my change does not break any existing tests or introduce unexplained benchmark regressions.
[ ] I've updated the documentation if necessary.

lorentey commented 1 year ago

Tasks with difference scores larger than 1.05:
  Score   Sum     Improvements Regressions  Name
  23.10   23.10   23.10(#76)   1.000(#0)    PersistentDictionary<Int, Int> sequential iteration, indices (*)
  21.88   21.88   21.88(#76)   1.000(#0)    PersistentDictionary<Int, Int> indexing subscript (*)
  21.36   21.36   21.36(#76)   1.000(#0)    PersistentDictionary<Int, Int>.Keys sequential iteration (*)
  21.34   21.34   21.34(#76)   1.000(#0)    PersistentDictionary<Int, Int>.Values sequential iteration (*)
  8.054   8.054   8.054(#76)   1.000(#0)    PersistentDictionary<Int, Int> unsuccessful index(forKey:) (*)
  5.866   5.866   5.866(#76)   1.000(#0)    PersistentDictionary<Int, Int> successful index(forKey:) (*)
  1.821   1.821   1.821(#76)   1.000(#0)    PersistentDictionary<Int, Int> random removals (missing keys) (*)
  1.817   1.817   1.817(#76)   1.000(#0)    PersistentDictionary<Int, Int> subscript, noop setter (*)
  1.798   1.798   1.798(#76)   1.000(#0)    PersistentDictionary<Int, Int> subscript, remove missing (*)
  1.751   1.751   1.751(#76)   1.000(#0)    PersistentDictionary<Int, Int> updateValue(_:forKey:), existing (*)
  1.736   1.736   1.736(#76)   1.000(#0)    PersistentDictionary<Int, Int> subscript, remove existing, unique (*)
  1.732   1.732   1.732(#76)   1.000(#0)    PersistentDictionary<Int, Int> random removals (existing keys) (*)
  1.720   1.720   1.720(#76)   1.000(#0)    PersistentDictionary<Int, Int> subscript, set existing (*)
  1.554   1.554   1.554(#76)   1.000(#0)    PersistentDictionary<Int, Int> subscript, _modify (*)
  1.538   1.538   1.538(#76)   1.000(#0)    PersistentDictionary<Int, Int> defaulted subscript, _modify existing (*)
  1.481   1.481   1.496(#74)   0.9846(#2)   PersistentDictionary<Int, Int> defaulted subscript, _modify missing (*)
  1.458   1.458   1.471(#74)   0.9867(#2)   PersistentDictionary<Int, Int> updateValue(_:forKey:), insert (*)
  1.240   1.240   1.240(#76)   1.000(#0)    PersistentDictionary<Int, Int> sequential iteration (*)
  1.186   1.186   1.187(#74)   0.9987(#2)   PersistentDictionary<Int, Int> defaulted subscript, unsuccessful lookups (*)
  1.172   1.172   1.175(#75)   0.9966(#1)   PersistentDictionary<Int, Int> defaulted subscript, successful lookups (*)
  1.172   1.172   1.172(#75)   0.9996(#1)   PersistentDictionary<Int, Int> subscript, unsuccessful lookups (*)
  1.162   1.162   1.162(#76)   1.000(#0)    PersistentDictionary<Int, Int> subscript, successful lookups (*)
  1.138   1.138   1.179(#64)   0.9590(#12)  PersistentDictionary<Int, Int> init(uniqueKeysWithValues:) (*)
  1.137   1.137   1.177(#64)   0.9595(#12)  PersistentDictionary<Int, Int> subscript, insert, unique (*)
  1.114   1.114   1.122(#54)   0.9921(#22)  PersistentDictionary<Int, Int> subscript, insert, shared (*)

lorentey commented 1 year ago

Performance highight: indexing operations benefit greatly from moving subtree counts into parent nodes:

01 PersistentDictionary Int, Int sequential iteration, indices

02 PersistentDictionary Int, Int indexing subscript

05 PersistentDictionary Int, Int unsuccessful index(forKey:)

06 PersistentDictionary Int, Int successful index(forKey:)

lorentey commented 1 year ago

The iterator is largely unchanged (I expect I'll have to soon slow it down somewhat, as the current implementation isn't legal Swift) -- but for some reason the current (dummy) Keys and Values implementations got a remarkable improvement.

03 PersistentDictionary Int, Int Keys sequential iteration

04 PersistentDictionary Int, Int Values sequential iteration

18 PersistentDictionary Int, Int sequential iteration

lorentey commented 1 year ago

Removals and updates got a 1.5x-1.7x speed boost from the storage layout & access improvements:

07 PersistentDictionary Int, Int random removals (missing keys)

08 PersistentDictionary Int, Int subscript, noop setter

09 PersistentDictionary Int, Int subscript, remove missing

10 PersistentDictionary Int, Int updateValue(_:forKey:), existing

11 PersistentDictionary Int, Int subscript, remove existing, unique

12 PersistentDictionary Int, Int random removals (existing keys)

13 PersistentDictionary Int, Int subscript, set existing

14 PersistentDictionary Int, Int subscript, _modify

15 PersistentDictionary Int, Int defaulted subscript, _modify existing

lorentey commented 1 year ago

Insertions to unique storage exhibit some unusual changes, resulting in some rather unique charts. Explaining this is a fun puzzle -- what I think is happening is that the new code has better allocation/resizing behavior.

The repeating pattern at 32x size increments is simply due to the size of each node (32 buckets) -- each time we increase the input size by 32x, we are effectively adding a new layer of nodes to the hash tree.

The old code used to have two separate buffers for items & children, so when a collision happened, upgrading an item to a child node sometimes required a bunch of reallocations to resize them. The new code uses a shared buffer for both regions, and the new child can simply make use of the space that was previously occupied by the old item, eliminating most of these. This probably explains why the humps in the before curve turn into valleys on after -- I suspect these humps/valleys might correspond to sizes at which the tree is busy introducing new nodes. If so, the stable spots in between are probably periods where the tree tends to just add new items to preexisting nodes.

A closer analysis would probably be interesting here, but for now I'm happy with these results.

16 PersistentDictionary Int, Int defaulted subscript, _modify missing

17 PersistentDictionary Int, Int updateValue(_:forKey:), insert

24 PersistentDictionary Int, Int subscript, insert, unique

23 PersistentDictionary Int, Int init(uniqueKeysWithValues:)

lorentey commented 1 year ago

Simple key lookups were already quite fast, so they only see (relatively) minor 16-18% improvements, perhaps due to reduced retain/release traffic. (The new code is quite careful to avoid copying node references.)

I feel this is getting close to optimal, although perhaps there is still some room for marginal improvements by rolling find back into the lookup implementation.

19 PersistentDictionary Int, Int defaulted subscript, unsuccessful lookups

20 PersistentDictionary Int, Int defaulted subscript, successful lookups

21 PersistentDictionary Int, Int subscript, unsuccessful lookups

22 PersistentDictionary Int, Int subscript, successful lookups

lorentey commented 1 year ago

Finally, insertions/removals into shared nodes are somewhat of a mixed bag -- we have minor 1-5% slowdowns at low sizes, but quite significant wins at the high end.

26 PersistentDictionary Int, Int subscript, remove existing, shared

25 PersistentDictionary Int, Int subscript, insert, shared

lorentey commented 1 year ago

(These charts need to be taken with a grain of salt -- the new results have far greater variance because I finally remembered to re-run them several times to get new hash seeds. I collected the old results over (maybe) one or two separate runs, so they mostly ended up creating and re-creating the exact same tree. I don't think it matters much, though: the effects of the random hash seeds only seem to be significant up to about 64 or so items -- beyond that the volume of data tends to smooth out the noise. (Until around 128k-256k items, when some caching or context switching artifacts start to (consistently) destabilize measurements again.)

pyrtsa commented 1 year ago

Out of curiosity, how do these charts compare to the corresponding non-persistent Swift collections?

Additionally, are you ultimately going to add or update the recorded benchmark charts under Documentation (such as here)?

lorentey commented 1 year ago

Out of curiosity, how do these charts compare to the corresponding non-persistent Swift collections?

It's a bit too early to advertise such comparisons at this point, or to draw huge conclusions from them. These will be more meaningful when we've fleshed out the implementation a bit more.

(If you're eager to get a picture of the current state, the benchmark library named PersistentDictionary generates such reference comparisons, so you can get such data. You just need to run ./Utils/run-benchmarks.sh library run --library "./Benchmarks/Libraries/PeristentDictionary.json" results.json, then render them with the library render command).

Additionally, are you ultimately going to add or update the recorded benchmark charts under Documentation (such as here)?

Yes, I expect we'll have a similar arrangement for these new types. It would also be nice to integrate these into the API docs to a certain extent at least -- DocC integration would let us do that (and more).

lorentey commented 1 year ago

Insertions to unique storage exhibit some unusual changes, resulting in some rather unique charts. Explaining this is a fun puzzle -- what I think is happening is that the new code has better allocation/resizing behavior.

\

@msteindorfer kindly noted that the behavior is for more plausibly explained by the allocation behavior getting horizontally shifted across the two implementations (plus a 1.5-2x general speedup). That makes far more sense! 😅

lorentey commented 1 year ago

The iterator is largely unchanged (I expect I'll have to soon slow it down somewhat, as the current implementation isn't legal Swift) -- but for some reason the current (dummy) Keys and Values implementations got a remarkable improvement.

Huh, TIL the stdlib defines a custom map implementation for Collection that uses indexing instead of the iterator. That's rather silly!

lorentey commented 1 year ago

Latest commits apply some improvements uncovered during code review.

@swift-ci test

apple / swift-collections

Persistent collections updates (part 4) #177

Checklist