basho / riak_dt

Convergent replicated datatypes in Erlang
Apache License 2.0
353 stars 70 forks source link

Swap orddict for dict in orswot and map #110

Closed russelldb closed 9 years ago

russelldb commented 9 years ago

After a report on IRC of Maps with 13k set elements inside taking up to 80seconds to merge, I ran some benchmarks comparing dict and orddict (as I had a suspicion this was the bulk of problem.)

This PR swaps orddict for dict. There may well be many more optimisations to do, but as the attached graph, and tables below show, this is a mighty big speed up.

This plot is insert times for dict vs orddict in an orswot. insert times into a set

Merge times, orswot, fully probably* fully disjoint sets with probably\ 20 actors orddict vs dict (times are microseconds)

[_Probably fully disjoint as crypto:rand_bytes(30) was used to generate elements, they could clash I guess. *_Probably 20 actors as actor chosen at random from a list of 20 randomly generated 8 byte actor ids, but if you have fewer than 20 elements, unlikely you have 20 actors, eh?]

Elements orddict dict
100000 1455538575 745421
10000 9621000 39468
1000 99874 3286
100 1467 428
10 54 64
1 11 24

This plot compares merge times for the code that used lots of simple set operations, vs. an uglier, single fold over the unioned keyspace of the map.

merge5_times

lenary commented 9 years ago

I thought this would bite us. Someone has a branch with hashdict stuff (from elixir). Sean will know about it, though maybe it was my branch. I haven't seen graphs/tables, so I don't know the comparison, but worth checking.

russelldb commented 9 years ago

Those branches will be stale. I asked about HashDict, not ready for primetime. Hopefully this will do for now.

Wish you'd said earlier that you thought this was an issue.

russelldb commented 9 years ago

Still todo

seancribbs commented 9 years ago

:+1: 44383b1

seancribbs commented 9 years ago

@borshop merge