Nordix / hiredis-cluster

C client library for Valkey/Redis Cluster. This project is used and sponsored by Ericsson. It is a fork of the now unmaintained hiredis-vip.
BSD 3-Clause "New" or "Revised" License
87 stars 42 forks source link

Use the async api when a connection error triggers a slot update #144

Closed bjosv closed 1 year ago

bjosv commented 1 year ago

When a command response indicates a communication error the slot map is updated. This is now updated using the async api to avoid blocking calls to connect when querying a cluster node.

This fixes the problem with hanging connects that blocks the event handling system.

Changed behaviors Previously communication errors were counted. When the client had received more errors that the configured max_retry_count value the Redis configuration "cluster-node-time" was fetched from a cluster node. This configured value was then used to determine when to perform a slotmap update. When an additional error was received after the time-to-wait the slotmap update procedure started. This procedure used blocking calls on a new TCP connection.

After this PR a communication error triggers the slotmap update procedure directly. Primarily a connected node is selected that is found close to a randomly picked index of all known nodes. The random index should give a more even distribution of selected nodes. If no connected node is found while iterating to this index the remaining nodes are also checked until a connected node is found. If no connected node is found; a node close o the picked index, for which a connection establishment has not been attempted within throttle-time, is selected. The commands are sent using the async api to avoid blocking sends (or connects). During the time the slotmap update procedure runs and until a second after it is finish other sent commands that triggers communication errors/timeouts will not start additional slotmap updates, ie the slotmap update is throttled.

Other: Using async-api during MOVED should be implemented as well, but done in other PR.

Fixes #142

bjosv commented 1 year ago

Ouch, the randomness makes the simulated-redis tests a bit harder to handle..

zuiderkwast commented 1 year ago

Ouch, the randomness makes the simulated-redis tests a bit harder to handle..

Right :-) We can inject some fake-randomness just for testing, e.g. using an ifdef or a known fixed random seed.