Closed GoPavel closed 1 week ago
One quick observation. subset and equality seem very expensive: I wonder if that's because the isSubsetHelper still uses an iterator, that allocates. Maybe just use recursion again?
Perhaps there's even faster way but I don't know enough about red-black trees to suggest one. If the representation is canonical, maybe you can exploit that. But I'm not sure it is or depends on the order of insertions/deletions. Do you know?
One quick observation. subset and equality seem very expensive: I wonder if that's because the isSubsetHelper still uses an iterator, that allocates. Maybe just use recursion again?
Perhaps there's even faster way but I don't know enough about red-black trees to suggest one. If the representation is canonical, maybe you can exploit that. But I'm not sure it is or depends on the order of insertions/deletions. Do you know?
It's not canonical, so we cannot have just a linear one-to-one comparison. I've experimented with a couple of alternatives I think it works pretty well now. I've managed to use the order of nodes in the tree to optimize it a bit.
I have updated the benchmark results in the description, for the comparison of the version before and after the last commit see https://github.com/serokell/motoko-base/pull/37 .
It's not canonical, so we cannot have just a linear one-to-one comparison. I've experimented with a couple of alternatives I think it works pretty well now. I've managed to use the order of nodes in the tree to optimize it a bit.
I have updated the benchmark results in the description, for the comparison of the version before and after the last commit see serokell#37 .
That's much better indeed! Thank you!
Also, I've optimized intersect
for Set
and fixed doc comments: https://github.com/serokell/motoko-base/pull/39
Squashed and fast-forward merged with commit 1961fab3ba2d1459ea6bae29c1c997328e03328f
(see https://dfinity.slack.com/archives/D07MAEML9RD/p1731527989433379)
This is an MR for the 3rd Milestone of the Serokell's grant about improving Motoko's base library.
The main goal of the PR is to introduce a new functional implementation of the set data structure to the' base' library. Also, it brings a few changes to the new functional map that was added in #664 , #654 .
General changes:
PersistentOrderedMap
toOrderedMap
(same for theOrderedSet
)Functional Map changes:
New functionality:
any
/all
functionscontains
functionminEntry
/maxEntry
Optimizations:
size
in the Map, benchmark resultsFixup:
entriesRev()
, removeiter()
NEW functional Set:
The new data structure implements an ordered set interface using Red-Black trees as well as the new functional map from the 1-2 Milestones.
API implemented:
put
,delete
,contains
,fromIter
, etcmap
,mapFilter
,foldLeft
,foldRight
union
,intersect
,diff
,isSubset
,equal
OrderedMap
):min
/max
,all
/some
Maintainance support:
Applied optimizations:
map
/filterMap
throughfoldLeft
foldLeft
OrderedMap
instead of sharing it, benchmark resultsintersect
optimization: use order of output values to build the resulting tree faster, see https://github.com/serokell/motoko-base/pull/39isSubset
,equal
optimization: use early exit and use order of subtrees to reduce intermediate tree height, see https://github.com/serokell/motoko-base/pull/37Rejected optimizations:
intersect
,union
,diff
) from Nipkow's book. However, the experiment shows that naive implementation with a simple size heuristic performs better. The benchmark results are comparing 3 versions:O(min(n,m)log((max(n,m))
which is very close to Nipkow's version). Sizes of sets are also stored but only in the root. The last one outperforms others and keeps a tree slim in terms of byte size. Thus, we have picked it.Final benchmark results:
Collection benchmarks
set API
new set API