aerospike / aerospike-client-go

Aerospike Client Go
Apache License 2.0
429 stars 199 forks source link

3.x.x version aerospike client has high BatchGet latency #348

Closed xqzhang2015 closed 3 years ago

xqzhang2015 commented 3 years ago

We are currently using 2.12.0 client. After piloting 3.x.x or 4.x.x client, the batchGet operation gets very slow. The network latency is about 38ms,

BTW, not sure about other operations, like Get/Put, because it's a PROD env. For DC having low latency network, the 3.x.x client works well, or not too bad.

Is there any special parameter to configure for upgrading to 3.x.x or newer version?

        policy := &as.ClientPolicy{
            AuthMode:                    as.AuthModeInternal,
            Timeout:                     time.Second,
            IdleTimeout:                 55 * time.Second,
            LoginTimeout:                10 * time.Second,
            ConnectionQueueSize:         1024,
            LimitConnectionsToQueueSize: true,
            FailIfNotConnected:          false,
            TendInterval:                time.Second,
            IgnoreOtherSubnetAliases:    false,
        }
khaf commented 3 years ago

This is definitely not supposed to happen. How many keys do you have in your batch request? Do you think you can provide a code snippet that could reproduce the issue so that I can investigate?

xqzhang2015 commented 3 years ago

~5 keys in a batch request, which are in the same namespace but different set( ~3 sets). It seems network latency highly related. When accessing aerospike cluster in the same data center, the network latency is ~1ms and the BatchGet looks good, or not too bad.

I will try to provide such a snippet, but it's highly network related, not sure if it could help.

xqzhang2015 commented 3 years ago

By checking more debug logs in my env, it shows that BatchGet with only 1 key works well(~40ms). All BatchGet with multiple keys(same namespace, ~3 sets) works bad. (about network_latency * keys-number ?). Maybe the Batch Operations are not done concurrently? @khaf FYI

xqzhang2015 commented 3 years ago

Maybe I have found the cause:

  1. BatchPolicy.ConcurrentNodes controlls the concurrency of BatchGet. The default value 1 makes batch requests sequentially.
  2. In 2.x.x, with default value 1, the batch requests physically work like ConcurrentNodes=0
  3. In 3.x.x or later, with default value 1, the batch op works matching with the policy description. And after setting ConcurrentNodes=0, the BatchGet latency works well too(similar to network latency).
xqzhang2015 commented 3 years ago

By comparing 3.0.0 and 2.12.0 source code, it seems the BatchPolicy.ConcurrentNodes doesn't work for BatchGet in 2.12.0 and start to control BatchGet in 3.0.0. Right?

khaf commented 3 years ago

Ah that should be it. In older versions, Batch requests were always concurrent and the ConcurrentNodes would be disregarded. Please use the latest client if you are going to upgrade, there have been a lot of changes, including support for the newer servers, which you could miss using the older ones.

xqzhang2015 commented 3 years ago

@khaf thanks for your confirmation and proposal.

khaf commented 3 years ago

If you are not in a rush, the v5 is coming soon with some breaking changes and the latest server features, including major rework of the error system which may be of interest to you.