Open drewkerrigan opened 9 years ago
cc: @wbrown
@drewkerrigan any more info on how to trigger it?
I’d really love to get a riak_test case for this. Current status: already getting ridiculous¹ — doesn’t help, riak is stable. Next thing to try is probably adding network delays.
¹) https://github.com/llelf/riak_test/blob/RCB/tests/rcb.erl#L47
Cross-post from: https://github.com/basho/yokozuna/issues/389
The specific riak_core_broadcast ordsets:add_element/2 queue length problem occurs when setting a custom bucket property (allow_mult=false in this case) multiple times on every node in a large (32 nodes in this case) cluster. The result seems to be a race condition that ends up multiplying the actual change to the bucket metadata. While not related to Yokozuna, this is still a problem that is not difficult to trigger in large clusters.
Here is a dump of erlang:processes(Pid, messages) on one of the problem pids: