On batch reads only AS_POLICY_REPLICA_SEQUENCE will actually find new nodes for requests, AS_POLICY_REPLICA_ANY will try the same node and fail again.

xarnfe01 commented 2 years ago

https://github.com/aerospike/aerospike-client-c/commit/f71b469da21364c9e2524d444e1cac8383686cf1

Aerospike had added a "batch split retry" mode to account for the fact that when it organized a batch based on what was on the same node on the first try, they may not all be on the same node on the second try.

Unfortunately this only happens for replica == AS_POLICY_REPLICA_SEQUENCE || replica == AS_POLICY_REPLICA_PREFER_RACK

https://github.com/aerospike/aerospike-client-c/blob/6.1.2/src/main/aerospike/aerospike_batch.c#L2818

We found this problem on Nov 2019 and we moved to using AS_POLICY_REPLICA_SEQUENCE instead of AS_POLICY_REPLICA_ANY for batch reads internally at Threatmetrix.

Split batch reads retries were added on CLIENT-1056 https://github.com/aerospike/aerospike-client-c/commit/f71b469da21364c9e2524d444e1cac8383686cf1 and the commit msg specifies only for SEQUENCE. Any idea why?

What is the reason to NOT do batch split retries and recompute where the batch work should be sent in the case of AS_POLICY_REPLICA_ANY?

I believe the same behaviour is happening on master branch after code inspection.

We use AS in AP mode.

Kind regards.

(our internal ticket for reference OTIN-8976)

BrianNichols commented 2 years ago

The original thought was that only SEQUENCE/PREFER_RACK would fully benefit from a batch read split retry. AS_POLICY_REPLICA_ANY would likely result in 50% of the keys being redirected to other nodes while SEQUENCE usually redirects 100% of the keys to other nodes.

Knowing this limitation, do you think it's still important for batch split retries to allow AS_POLICY_REPLICA_ANY?

xarnfe01 commented 2 years ago

In short, yes.

Not sure where the 50% figure comes from.

AS_POLICY_REPLICA_ANY | Distribute reads across nodes containing key's master and replicated partition in round-robin fashion. Currently restricted to master and one prole.

I would expect this to mean that in a NORMAL read, with RF=2, the read that would go on partition P will go to a different node than the previous one went (alternating, round robin).

If one server goes down, then in a NORMAL, if the first read failed, I would expect the next one to round robin to the next node that contains the partition (master or prole). If that node did not contain the data, then it would surely proxy to the right node.

I believe you are saying is that the behaviour of ANY is NOT round-robin on a per key READ basis, but it uses a global iterator, and thus the effect on a retry is that any of the 2 nodes could be chosen.

    case AS_POLICY_REPLICA_ANY: {
        // Alternate between master and prole for reads with global iterator.
        uint32_t r = as_faa_uint32(&g_randomizer, 1);
        use_master = (r & 1);
        return get_sequence_node(cluster, p, use_master);
    }

This is not ideal, as during a retry you don't want to talk to the same node in case that node had a problem.

Still during a retry (BATCH or NORMAL), we know that if heartbeat timeout (timeout*interval) have passed, that we should have recomputed the partition table. We are currently timing the retries to happen after that condition. This means that it would make sense for the client to recompute to which servers the client will talk to and thus do a batch read split retry even in the case of AS_POLICY_REPLICA_ANY.

Moving from AS_POLICY_REPLICA_ANY to AS_POLICY_REPLICA_SEQUENCE is a posibility and we did that for BATCH, but for small cluster sizes and/or read patterns, it can result in a less balanced load.

Kind regards, Ferran

xarnfe01 commented 2 years ago

Given that in as_command_execute we toggle master with cmd->master = !cmd->master;

if we were to use cmd->use_master in AS_POLICY_REPLICA_ANY in the same way that it is used in AS_POLICY_REPLICA_SEQUENCE in our call to get_sequence_node, but we initialized master randomly in the case of AS_POLICY_REPLICA_ANY, we would have a true round-robin iteration on a per read basis which is what the documentation seems to imply.

cmd->master is currently initalized in as_command_start_timer

Kind regards, Ferran

BrianNichols commented 2 years ago

I will try to summarize your suggested changes to AS_POLICY_REPLICA_ANY for batch commands.

1) Obtain a random bool that indicates if all keys in the batch should start at master (or prole).

2) Assign keys to nodes based on the bool chosen in 1).

3) Issue batch node commands. If any node commands fail, invert the bool and split the failed keys into new node commands to according to the new bool value.

4) Repeat 3) until all batches succeed or max_retries exceeded or timeout.

Is this correct?

These changes are tractable and would make AS_POLICY_REPLICA_ANY practical for batch split retries.

xarnfe01 commented 2 years ago

Thisis perfectly correct from the point of view of batch commands.

I would also make 1 and 2 and the part of inverting the bool in 3 also for non-batch commands (normal reads). This would make AS_POLICY_REPLICA_ANY more useful in the case of retries of normal reads as it would ensure that the retries do NOT go to the previously failed server.

Kinds regards, Ferran

BrianNichols commented 2 years ago

C client 6.2.0 is released: https://download.aerospike.com/download/client/c/notes.html#6.2.0

AS_POLICY_REPLICA_ANY is now treated like AS_POLICY_REPLICA_SEQUENCE on retry for batch and single record commands. AS_POLICY_REPLICA_ANY still distributes round robin on the first command attempt.

aerospike / aerospike-client-c

On batch reads only AS_POLICY_REPLICA_SEQUENCE will actually find new nodes for requests, AS_POLICY_REPLICA_ANY will try the same node and fail again. #130