StackExchange / StackExchange.Redis

General purpose redis client
https://stackexchange.github.io/StackExchange.Redis/
Other
5.88k stars 1.51k forks source link

Cluster - Connecting to 0.0.0.0:6379 for some reason #2580

Open ghost opened 11 months ago

ghost commented 11 months ago

Hello there,

I am connecting to a cluster with 1 master and 2 replicas. The code is pretty simple:

var configuration = new ConfigurationOptions();
configuration.EndPoints.Add("10.254.61.60", 6379);
configuration.EndPoints.Add("10.254.61.61", 6379);
configuration.EndPoints.Add("10.254.61.62", 6379);
configuration.Ssl = true;
configuration.User = "admin";
configuration.Password = "password";
configuration.CertificateValidation += (object sender, X509Certificate? certificate, X509Chain? chain, SslPolicyErrors sslPolicyErrors) => { return true; };
// 5 seconds
ConnectionMultiplexer redis = ConnectionMultiplexer.Connect(configuration, Console.Out);

But the connection takes up to 5 seconds, because for some reason the client is trying to connect to a 0.0.0.0:6379 endpoint and it takes it 4.7 seconds to fail. I have no idea why it would try to do that. There is no error in the logs nor any explanation I could find. logs.txt

P.S. the cert verification is required because we are using certs signed by our internal CA which my local PC doesn't trust. The app running in production doesn't have that line but does have the same issue.

mgravell commented 11 months ago

Is it possible for you to connect to one of the servers with redis-cli and issue the command cluster nodes ? I'm very curious what the server is responding there

ghost commented 11 months ago

Master - 127.0.0.1:6379> cluster nodes 7af72812710708717f3dee924254ac77a8f0cbc3 0.0.0.0:6379@16379 myself,master - 0 0 0 connected 0-16383

Replica - 127.0.0.1:6379> cluster nodes 7af72812710708717f3dee924254ac77a8f0cbc3 10.254.61.60:6379@16379 master - 0 1698221169052 0 connected 0-16383 127c5d1020c09098e010e19d96294b1dad1fff30 0.0.0.0:6379@16379 myself,slave 7af72812710708717f3dee924254ac77a8f0cbc3 0 0 0 connected

Replica - 127.0.0.1:6379> cluster nodes 7af72812710708717f3dee924254ac77a8f0cbc3 10.254.61.60:6379@16379 master - 0 1698221165064 0 connected 0-16383 f97c9da47d911ac99f56b11170a8d325e3edf3a3 0.0.0.0:6379@16379 myself,slave 7af72812710708717f3dee924254ac77a8f0cbc3 0 0 0 connected

Well I am buffled, Master doesn't know about replicas And replicas know about only the Master. I am not sure what went wrong.

dergyitheron commented 11 months ago

Hey,

I'm colleague of @Qualatea trying to dig out more information as he's currently occupied.

Could this be related to cluster-announce-ip 0.0.0.0 option being set on both master and replicas? I was trying to dig some more information from the config and this part is included in Docker/NAT support section.

We are running the master and replicas on VMs with static IP and ports being directly exposed to the clients, so could commenting out cluster-announce options solve this? Or simply setting it to the actual static IP of the node.

Thanks.

David

slorello89 commented 11 months ago

cluster-announce-ip is the ip that the nodes announce. It's there so that when redis or a client does exactly what StackExchange.Redis is doing (asking the cluster about its topography) - it can point them to the correct IP address. So yes that's definitely why you're seeing behavior. You most likely should be able to comment out that line (make sure the other cluster-announce-* fields aren't configured oddly either naturally.

The classic case where it would be necessary is if you were running your cluster inside of a docker network and wanted to do some manual NAT for your Redis instance.

mgravell commented 11 months ago

I propose that we do add a minor tweak to explicitly exclude wildcard addresses (with an entry in the log), but: the library can't work with this misconfiguration, so the "fix" here is to not tell the servers to advertise that address

NickCraver commented 9 months ago

@mgravell I like the thinking, but I don't think we should do that, simply because it might work even if it seems very wrong. If the response is local and the destination is too, that configuration may be working today, at least on Linux environments. It's not quite the same, but 0.0.0.0 ~= 127.0.0.1 in functionality and given containers...I can easily see that guard breaking a working scenario. Thoughts?