Open jtuple opened 10 years ago
Any idea on how much work? May have to punt on this post-2.0.
i vote punt. yokozuna doesn't support either.
Punted to 2.1
What are the implications of the break? How will this impact customers who grow and shrink clusters on a regular basis?
@randysecrist
How will this impact customers who grow and shrink clusters on a regular basis?
This doesn't affecting shrinking and growing a cluster by adding/removing nodes. Just changing ring size, which is still an experimental feature, afaik.
Just commenting to note that this is still an issue in the latest code. I discovered this problem completely by accident when I noticed extremely high CPU usage coming from riak_ensemble on my dev setup. Long story short, it turned out that I had previously been running at a larger ring size, and hadn't cleared out the ensemble data when I wiped my cluster data. So it continued trying to start ensembles for non-existent vnodes, which ended up getting stuck in loops of continuously trying to monitor a vnode process and then getting the {'DOWN',...} message and starting over again.
Currently,
riak_ensemble
does not support deleting ensembles -- only creating them. We should add support for deleting ensembles toriak_ensemble
, as well as updateriak_kv_ensembles
andriak_kv_ensemble_backend
to delete ensembles if the ring size shrinks. Without this change, a cluster using strong consistency will break if dynamic ring resizing is used to shrink the ring. On the other hand, growing the ring should be safe -- although, we should test this.We either need to fix this before shipping 2.0, or decide to not support ring resizing for strongly consistent clusters until a later release.
/cc basho/riak#536