Nordix / hiredis-cluster

C client library for Valkey/Redis Cluster. This project is used and sponsored by Ericsson. It is a fork of the now unmaintained hiredis-vip.
BSD 3-Clause "New" or "Revised" License
86 stars 42 forks source link

Event loop occasionally hangs after redisClusterAsyncDisconnect under high connection error conditions #219

Closed plainbanana closed 2 weeks ago

plainbanana commented 3 months ago

Hello hiredis-cluster team,

We have encountered a situation where the event loop hangs after issuing redisClusterAsyncDisconnect in conditions where connection errors are likely to occur.

Here is a reproducible example on the current master branch. The program hangs only when .heavyoperation is set to true. Sorry for the rough example. https://gist.github.com/plainbanana/0f287c6afec4527dba566d01e98aca63

I may be wrong, but I have looked into possible causes:

  1. A debug sleep 1 added in redisClusterAsyncCommandToNode timed out.
  2. This timeout triggered updateSlotMapAsync, adding a CLUSTER NODES command.
  3. A heavy operation (e.g. sleep 1) was executed in the commandCallback.
  4. redisClusterAsyncDisconnect was executed in the commandCallback.
  5. Due to the timeout of the CLUSTER NODES command (?), a retry was attempted in clusterNodesReplyCallback, which established a new connection and added another CLUSTER NODES command.
  6. The hiredis redisProcessCallbacks did not recognize REDIS_DISCONNECTING, leading to a hang.

Is there a way to avoid the hang and successfully terminate the connections in such a scenario? In my humble opinion, it might be better not to initiate a new connection in (5) if redisClusterAsyncDisconnect has already been issued. I would appreciate your input.

Thank you.

bjosv commented 3 months ago

In my humble opinion, it might be better not to initiate a new connection in (5) if redisClusterAsyncDisconnect has already been issued.

Yes, it sounds reasonable that hiredis-cluster shouldn't initiate new connections when a user wants to disconnect. When disconnecting the flag REDIS_DISCONNECTING is set in hiredis which blocks new commands on existing nodes. But we should probably add a similar flag in hiredis-cluster to avoid creating new connections to nodes.

bjosv commented 2 weeks ago

Closed by #225