Closed lorentey closed 1 year ago
Tasks with difference scores larger than 1.05:
Score Sum Improvements Regressions Name
23.10 23.10 23.10(#76) 1.000(#0) PersistentDictionary<Int, Int> sequential iteration, indices (*)
21.88 21.88 21.88(#76) 1.000(#0) PersistentDictionary<Int, Int> indexing subscript (*)
21.36 21.36 21.36(#76) 1.000(#0) PersistentDictionary<Int, Int>.Keys sequential iteration (*)
21.34 21.34 21.34(#76) 1.000(#0) PersistentDictionary<Int, Int>.Values sequential iteration (*)
8.054 8.054 8.054(#76) 1.000(#0) PersistentDictionary<Int, Int> unsuccessful index(forKey:) (*)
5.866 5.866 5.866(#76) 1.000(#0) PersistentDictionary<Int, Int> successful index(forKey:) (*)
1.821 1.821 1.821(#76) 1.000(#0) PersistentDictionary<Int, Int> random removals (missing keys) (*)
1.817 1.817 1.817(#76) 1.000(#0) PersistentDictionary<Int, Int> subscript, noop setter (*)
1.798 1.798 1.798(#76) 1.000(#0) PersistentDictionary<Int, Int> subscript, remove missing (*)
1.751 1.751 1.751(#76) 1.000(#0) PersistentDictionary<Int, Int> updateValue(_:forKey:), existing (*)
1.736 1.736 1.736(#76) 1.000(#0) PersistentDictionary<Int, Int> subscript, remove existing, unique (*)
1.732 1.732 1.732(#76) 1.000(#0) PersistentDictionary<Int, Int> random removals (existing keys) (*)
1.720 1.720 1.720(#76) 1.000(#0) PersistentDictionary<Int, Int> subscript, set existing (*)
1.554 1.554 1.554(#76) 1.000(#0) PersistentDictionary<Int, Int> subscript, _modify (*)
1.538 1.538 1.538(#76) 1.000(#0) PersistentDictionary<Int, Int> defaulted subscript, _modify existing (*)
1.481 1.481 1.496(#74) 0.9846(#2) PersistentDictionary<Int, Int> defaulted subscript, _modify missing (*)
1.458 1.458 1.471(#74) 0.9867(#2) PersistentDictionary<Int, Int> updateValue(_:forKey:), insert (*)
1.240 1.240 1.240(#76) 1.000(#0) PersistentDictionary<Int, Int> sequential iteration (*)
1.186 1.186 1.187(#74) 0.9987(#2) PersistentDictionary<Int, Int> defaulted subscript, unsuccessful lookups (*)
1.172 1.172 1.175(#75) 0.9966(#1) PersistentDictionary<Int, Int> defaulted subscript, successful lookups (*)
1.172 1.172 1.172(#75) 0.9996(#1) PersistentDictionary<Int, Int> subscript, unsuccessful lookups (*)
1.162 1.162 1.162(#76) 1.000(#0) PersistentDictionary<Int, Int> subscript, successful lookups (*)
1.138 1.138 1.179(#64) 0.9590(#12) PersistentDictionary<Int, Int> init(uniqueKeysWithValues:) (*)
1.137 1.137 1.177(#64) 0.9595(#12) PersistentDictionary<Int, Int> subscript, insert, unique (*)
1.114 1.114 1.122(#54) 0.9921(#22) PersistentDictionary<Int, Int> subscript, insert, shared (*)
Performance highight: indexing operations benefit greatly from moving subtree counts into parent nodes:
The iterator is largely unchanged (I expect I'll have to soon slow it down somewhat, as the current implementation isn't legal Swift) -- but for some reason the current (dummy) Keys
and Values
implementations got a remarkable improvement.
Removals and updates got a 1.5x-1.7x speed boost from the storage layout & access improvements:
Insertions to unique storage exhibit some unusual changes, resulting in some rather unique charts. Explaining this is a fun puzzle -- what I think is happening is that the new code has better allocation/resizing behavior.
The repeating pattern at 32x size increments is simply due to the size of each node (32 buckets) -- each time we increase the input size by 32x, we are effectively adding a new layer of nodes to the hash tree.
The old code used to have two separate buffers for items & children, so when a collision happened, upgrading an item to a child node sometimes required a bunch of reallocations to resize them. The new code uses a shared buffer for both regions, and the new child can simply make use of the space that was previously occupied by the old item, eliminating most of these. This probably explains why the humps in the before
curve turn into valleys on after
-- I suspect these humps/valleys might correspond to sizes at which the tree is busy introducing new nodes. If so, the stable spots in between are probably periods where the tree tends to just add new items to preexisting nodes.
A closer analysis would probably be interesting here, but for now I'm happy with these results.
Simple key lookups were already quite fast, so they only see (relatively) minor 16-18% improvements, perhaps due to reduced retain/release traffic. (The new code is quite careful to avoid copying node references.)
I feel this is getting close to optimal, although perhaps there is still some room for marginal improvements by rolling find
back into the lookup implementation.
Finally, insertions/removals into shared nodes are somewhat of a mixed bag -- we have minor 1-5% slowdowns at low sizes, but quite significant wins at the high end.
(These charts need to be taken with a grain of salt -- the new results have far greater variance because I finally remembered to re-run them several times to get new hash seeds. I collected the old results over (maybe) one or two separate runs, so they mostly ended up creating and re-creating the exact same tree. I don't think it matters much, though: the effects of the random hash seeds only seem to be significant up to about 64 or so items -- beyond that the volume of data tends to smooth out the noise. (Until around 128k-256k items, when some caching or context switching artifacts start to (consistently) destabilize measurements again.)
Out of curiosity, how do these charts compare to the corresponding non-persistent Swift collections?
Additionally, are you ultimately going to add or update the recorded benchmark charts under Documentation (such as here)?
Out of curiosity, how do these charts compare to the corresponding non-persistent Swift collections?
It's a bit too early to advertise such comparisons at this point, or to draw huge conclusions from them. These will be more meaningful when we've fleshed out the implementation a bit more.
(If you're eager to get a picture of the current state, the benchmark library named PersistentDictionary
generates such reference comparisons, so you can get such data. You just need to run ./Utils/run-benchmarks.sh library run --library "./Benchmarks/Libraries/PeristentDictionary.json" results.json
, then render them with the library render
command).
Additionally, are you ultimately going to add or update the recorded benchmark charts under Documentation (such as here)?
Yes, I expect we'll have a similar arrangement for these new types. It would also be nice to integrate these into the API docs to a certain extent at least -- DocC integration would let us do that (and more).
Insertions to unique storage exhibit some unusual changes, resulting in some rather unique charts. Explaining this is a fun puzzle -- what I think is happening is that the new code has better allocation/resizing behavior.
\
@msteindorfer kindly noted that the behavior is for more plausibly explained by the allocation behavior getting horizontally shifted across the two implementations (plus a 1.5-2x general speedup). That makes far more sense! 😅
The iterator is largely unchanged (I expect I'll have to soon slow it down somewhat, as the current implementation isn't legal Swift) -- but for some reason the current (dummy) Keys and Values implementations got a remarkable improvement.
Huh, TIL the stdlib defines a custom map
implementation for Collection
that uses indexing instead of the iterator. That's rather silly!
Latest commits apply some improvements uncovered during code review.
@swift-ci test
This is the first in a series of subsequent PRs with large-scale logic changes. This first one concentrates on the representation of node storage, and the patterns of accessing it:
ManagedBuffer
to represent node storage. This has useful performance benefits, but it significantly complicates access to node data. We no longer have direct access to the underlying raw storage: everything needs to go through closures passed toManagedBuffer.withUnsafeMutablePointers
.struct _Node
andstruct _Node.UnsafeHandle
to represent a (strong) reference to a hash tree node, and a convenient handle for node storage, respectively. Use these instead of directly accessing the ManagedBuffer object, whenever feasible. (_Node
holds the implementation of all the actual tree operations -- it provides mutating methods to manipulate a (sub)tree, providing a workable illusion of value semantics on the level of individual subtrees.UnsafeHandle
provides both read-only and mutable views into storage inside a node -- mutating operations require (and assume) that the corresponding_Node
holds a uniquely held reference to the storage object.)_Node
struct. (This results in a cool ~20x speedup for jumping around in the tree by integer offsets.)The implementation of dictionary operations remain conceptually unchanged for now (although the details look superficially different now) -- future updates will start tweaking things in earnest.
Checklist