Closed Aloren closed 5 years ago
We were about to update this page to point to a new feature that addresses this. Will do so soon, and here are details about this feature (quiesce).
Great news! Thank you! 💃
Hello. Does it exist some solution for avoiding such exception from java-client ?
For unexpected (outside of maintenance) events where a server node goes down, the client policy has parameters to control the retries. Some details on this knowledge base article. For read transactions, you typically can immediately retry against another replica (assuming running with replication factor 2 or more). For write transactions, you would have to give it a bit of time (depending on the server side configuration) for a new master to be elected as the cluster reforms after a node loss.
@mtendjou hello, I read this article before and the main question why ConnectionRefused don't retrable by default. And why maxRetries for WritePolicy is 0 by default if only socket_timeout, AEROSPIKE_ERR_CONNECTION AEROSPIKE_ERR_TIMEOUT are allowed to retries ?
Write transactions can timeout after having been processed on the server side, so it may not be safe to retry them (unless if the in doubt
flag is false -- in which case the client knows the server couldn't have received the transaction). So, by default, the retry on write transactions is set to false. I am not sure the details of ConnectionRefused, but if it is something that can succeed on a retry, it should be retry-able. The configuration is not about what type of errors the client would retry but about what type of transactions should be retried.
According to documentation when Aerospike is upgraded the first step in procedure is to stop Aerospike service. So when Aerospike service gets stopped -- client gets
java.net.ConnectException: Connection refused
errors. Aerospike client already polls server for the updates each second -- is it possible firstly to remove node from available nodes list, wait until number of active connections to the node drops to 0 -- after that shutdown the node? Other possibility is for the client to catchConnectException
and retry request to the other node. WDYT?