Nordix / hiredis-cluster

C client library for Valkey/Redis Cluster. This project is used and sponsored by Ericsson. It is a fork of the now unmaintained hiredis-vip.
BSD 3-Clause "New" or "Revised" License
88 stars 43 forks source link

Redis command failures & command timeouts after adding a new Master #115

Open SS-TruMinds opened 1 year ago

SS-TruMinds commented 1 year ago

Hello,

We have a 3-Node cluster on which we perform CRUD operations from our application using async hiredis-cluster API calls.

If we add a 4th Master node to this cluster while this test is still running, we start seeing redisClusterAsyncCommand failures for some of the new commands, and callbacks with NULL replies for some of commands already invoked for which we are waiting for replies.

It could be that hiredis-cluster is not in sync with the redistributed hash slots after addition of the new Master.

Kindly let us know if this scenario is supported, and works fine for you.

Thank you for the support.

zuiderkwast commented 1 year ago

Thanks for the report! It's appreciated. Your testing helps us improve this library.

We'll try to reproduce this and solve this, hopefully soon. If you have a test case that can reproduce this issue, we're happy if you can share it.

bjosv commented 1 year ago

@SS-TruMinds Which version of Redis are you running towards?

SS-TruMinds commented 1 year ago

@SS-TruMinds Which version of Redis are you running towards?

Hello, we are using Redis v7.0.4

SS-TruMinds commented 1 year ago

Thanks for the report! It's appreciated. Your testing helps us improve this library.

We'll try to reproduce this and solve this, hopefully soon. If you have a test case that can reproduce this issue, we're happy if you can share it.

Thank you.

We have a 3 node Redis cluster running in a Kubernetes namespace, and an application performing CRUD operation on some keys running in another namespace. The issue starts after we add a 4th node while the test is still running.

We have also observed multiple connect callbacks from this new node. There are no disconnect callbacks in between, just connect callbacks one after the other.

Thank you.

bjosv commented 1 year ago

@SS-TruMinds Regarding the observed multiple connect callbacks, do you know if the status argument in the callbacks are REDIS_OK or REDIS_ERR? One scenario might be that the last callback is REDIS_OK, while the first ones are REDIS_ERR which wont trigger a disconnect callback. The disconnect callback should only be called after a successful connect.

SS-TruMinds commented 1 year ago

We ran this again, and did notice disconnect callbacks too after connect callback. It seems the connection is not stable & we keep seeing connects followed by disconnects. On redis-cli, cluster info command shows cluster_state:ok

SS-TruMinds commented 1 year ago

@SS-TruMinds Regarding the observed multiple connect callbacks, do you know if the status argument in the callbacks are REDIS_OK or REDIS_ERR? One scenario might be that the last callback is REDIS_OK, while the first ones are REDIS_ERR which wont trigger a disconnect callback. The disconnect callback should only be called after a successful connect.

Ran this again & we do see disconnects after connects. It seems the client keeps losing connection to the newly added Master, and keep trying to connect again & again. cluser state on redis-cli is 'ok'

bjosv commented 1 year ago

I have found an issue with missing callbacks when the ASKING command is not sent due to a disconnect. This might be related but its not obvious to me, will be fixed.

zuiderkwast commented 1 year ago

Solved by #120?