aerospike / aerospike-client-java

Aerospike Java Client Library
Other
236 stars 212 forks source link

Aerospike client fails operations with exception when Aerospike server node goes down #120

Closed Aloren closed 5 years ago

Aloren commented 5 years ago

According to documentation when Aerospike is upgraded the first step in procedure is to stop Aerospike service. So when Aerospike service gets stopped -- client gets java.net.ConnectException: Connection refused errors. Aerospike client already polls server for the updates each second -- is it possible firstly to remove node from available nodes list, wait until number of active connections to the node drops to 0 -- after that shutdown the node? Other possibility is for the client to catch ConnectException and retry request to the other node. WDYT?

mtendjou commented 5 years ago

We were about to update this page to point to a new feature that addresses this. Will do so soon, and here are details about this feature (quiesce).

Aloren commented 5 years ago

Great news! Thank you! 💃

yarosman commented 2 years ago

Hello. Does it exist some solution for avoiding such exception from java-client ?

mtendjou commented 2 years ago

For unexpected (outside of maintenance) events where a server node goes down, the client policy has parameters to control the retries. Some details on this knowledge base article. For read transactions, you typically can immediately retry against another replica (assuming running with replication factor 2 or more). For write transactions, you would have to give it a bit of time (depending on the server side configuration) for a new master to be elected as the cluster reforms after a node loss.

yarosman commented 2 years ago

@mtendjou hello, I read this article before and the main question why ConnectionRefused don't retrable by default. And why maxRetries for WritePolicy is 0 by default if only socket_timeout, AEROSPIKE_ERR_CONNECTION AEROSPIKE_ERR_TIMEOUT are allowed to retries ?

mtendjou commented 2 years ago

Write transactions can timeout after having been processed on the server side, so it may not be safe to retry them (unless if the in doubt flag is false -- in which case the client knows the server couldn't have received the transaction). So, by default, the retry on write transactions is set to false. I am not sure the details of ConnectionRefused, but if it is something that can succeed on a retry, it should be retry-able. The configuration is not about what type of errors the client would retry but about what type of transactions should be retried.