aerospike / aerospike-client-java

Aerospike Java Client Library
Other
236 stars 212 forks source link

Prevent the possibility of skipping an active node in getRandomNode when accessing concurrently #210

Closed reugn closed 2 years ago

reugn commented 2 years ago

This is accurately the proposal of this PR.

BrianNichols commented 2 years ago

The PR is functionally accurate, but includes one extra unnecessary index++ increment. It will still be accurate if this extra index++ increment is removed.

khaf commented 2 years ago

I'd say the older algorithm would distribute the load better considering we look for a random node. We should probably use the XORShift and just select the first index randomly to do it even better.

BrianNichols commented 2 years ago

I think the method name getRandomNode() is a little misleading because the real intention is to distribute single node requests evenly across the cluster. getRoundRobinNode() would have been more accurate.

I don't have a strong opinion on the algorithms because they are identical in the most common case where all nodes are active in the cluster. Nodes are only set to inactive just prior to be being removed, so the lag is in milliseconds. Falling into this inactive node logic is extremely rare.

The active node flag's main use is in partition maps where the lag between inactive nodes and other nodes claiming partition ownership of the inactive node's partitions is much greater.