Open leonliao opened 3 months ago
I see. In your tests, if you delete the slave Pod of a shard, a new Pod will be created to join the shard. However, due to the time it takes for DNS to become effective, the redis-cli add-node command may result in an error.
I see. In your tests, if you delete the slave Pod of a shard, a new Pod will be created to join the shard. However, due to the time it takes for DNS to become effective, the redis-cli add-node command may result in an error.
Yes. I think all addons using FQDN to identify nodes should be reviewed, to check whether addons are having the same issue .
I see. In your tests, if you delete the slave Pod of a shard, a new Pod will be created to join the shard. However, due to the time it takes for DNS to become effective, the redis-cli add-node command may result in an error.
Yes. I think all addons using FQDN to identify nodes should be reviewed, to check whether addons are having the same issue .
Sure, this is a great suggestion to us.
Describe the bug For Redis Custer, current
addons/redis/redis-cluster-scripts/redis-cluster-server-start.sh
uses FQDN to add a node to cluster. But after a redis node pod rebuild or creation, due to the DNS cached entry refreshed after the cluster add-node command or the new FQDN DNS entry being resolvable after the command , it is possible that the cluster joining could fail.To Reproduce
Simulate a pod leaving the cluster and rejoin.
redis-cli --cluster del-node $current_node_ip_and_port $current_node_cluster_id
, simulating addons/redis/redis-cluster-scripts/redis-cluster-replica-member-leave.shDNS staled cache points the FQDN to old POD
DNS taking effect after the add-node command