arximboldi / immer

Postmodern immutable and persistent data structures for C++ — value semantics at scale
https://sinusoid.es/immer
Boost Software License 1.0
2.48k stars 180 forks source link

optimised small collections #22

Open jjl opened 6 years ago

jjl commented 6 years ago

One of the improvements made in clojure 1.8 was incorporating some work done by Zach Tellman to create 'unrolled' variants of maps and vectors. Essentially, it was a piece of clojure that generated an optimised java class for each size of vector and map up to N (I think N was five or six) so that that costs of creating and querying small vectors and maps was faster. When an operation would make them too big, they turn into the more algorithmically optimal type.

We already have an array and guidance to use it for up to ~100 items but we don't have anything for maps and sets. There is also associated code change when moving from arrays to vector / flex_vector. It would be nice to have a uniform API for this like clojure.

A reasonable first stab at it might just be a small_map that maintained an array of pairs and simply iterated through it instead of doing a hash lookup.

Obviously I haven't done any benchmarking at all and this was work done to clojure's collections which may not directly carry over to immer, but food for thought nonetheless. If there is a performance improvement to be made on small collections, it could make considerable differences in a dynamic language environment where maps are often used for argument passing. I know dynamic languages are an explicit target for you, so probably worth a look?

arximboldi commented 6 years ago

I am aware of that optimization. Because of static typing it might be counter-productive to implement it in C++ (type-erasing the underlying impl means adding overhead that I suspect would in most cases shadow the potential gains). I'll keep this open though in case we find in the future particular workloads where this would make a difference.

arximboldi commented 6 years ago

And thanks for the suggestion!

jjl commented 6 years ago

I see your point with the auto-promotion, but I think a small map type may be beneficial. You know your benchmark results better than me, of course.

arximboldi commented 6 years ago

I see your point with the auto-promotion, but I think a small map type may be beneficial.

Yes, you are right. It should be relatively easy to hack on top of immer::array and see what benchmarks show. I leave this open for later---next step is map/set transients.