KeyDB data won't replicate

rhzs commented 4 years ago

Hi,

I tried the default setup on GKE v16 with Helm v3. helm install keydb enapter/keydb

Then, run redis client:

$ kubectl run -it redis-cli --image=redis --restart=Never /bin/bash
root@redis-cli:/data# redis-cli -c -p 6379 -h 10.117.44.3
10.117.44.3:6379> set foo bar
OK
10.117.44.3:6379> get foo
-> Redirected to slot [12182] located at 10.117.44.7:6379
"bar"
10.117.44.7:6379> quit
root@redis-cli:/data# redis-cli -c -p 6379 -h 10.8.2.11
10.8.2.11:6379> get foo
(nil)          ----> THIS IS SUPPOSE TO RETURN "bar" in Multi master environment

Any idea why I can't get data on second pod at 10.8.2.11 ?

Antiarchitect commented 4 years ago

keydb-managed-0:/data# redis-cli
Message of the day:
  We want to hear from you! Help make KeyDB better with a quick 5 question survey: https://www.surveymonkey.com/r/Y9XNS93

127.0.0.1:6379> get foo
(nil)
127.0.0.1:6379> set foo bar
OK
127.0.0.1:6379>
keydb-managed-1:/data# redis-cli
Message of the day:
  We want to hear from you! Help make KeyDB better with a quick 5 question survey: https://www.surveymonkey.com/r/Y9XNS93

127.0.0.1:6379> get foo
"bar"
127.0.0.1:6379> set foo baz
OK
127.0.0.1:6379>
keydb-managed-0:/data# redis-cli
Message of the day:
  We want to hear from you! Help make KeyDB better with a quick 5 question survey: https://www.surveymonkey.com/r/Y9XNS93

127.0.0.1:6379> get foo
"baz"
127.0.0.1:6379>

Cannot reproduce. Can you provide me more logs? And please check that one node can reach another via 6379 port.

And plese provide the Replication section of info: 127.0.0.1:6379> info

rhzs commented 4 years ago

@Antiarchitect

In GKE, DNS won't resolve the pod address at keydb-1.keydb.default.svc.cluster.local. It can only resolve the service keydb.default.svc.cluster.local.

I tried to add and use kube headless service at keydb-1.keydb.keydb-headless.default or keydb-1.keydb-headless.default but it won't work either.

It can only connect, if I use Pod IP directly.

Any thoughts?

Antiarchitect commented 4 years ago

According to these official docs: https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#stable-network-id all should work if cluster DNS is cluster.local. StatefulSet's field .spec.serviceName is headless service name. I will try to check the situation on GKE in some time and let you know.

rhzs commented 4 years ago

My GKE version is 1.14.10-gke.17. I have also tried to debug with nslookup. I hope it will be useful.

Antiarchitect commented 4 years ago

It's actually strange, because the pod identifies self correctly:

root@keydb-0:/data# ping keydb-0
PING keydb-0.keydb.default.svc.cluster.local (10.12.1.3) 56(84) bytes of data.
64 bytes from keydb-0.keydb.default.svc.cluster.local (10.12.1.3): icmp_seq=1 ttl=64 time=0.016 ms
64 bytes from keydb-0.keydb.default.svc.cluster.local (10.12.1.3): icmp_seq=2 ttl=64 time=0.031 ms
64 bytes from keydb-0.keydb.default.svc.cluster.local (10.12.1.3): icmp_seq=3 ttl=64 time=0.033 ms
64 bytes from keydb-0.keydb.default.svc.cluster.local (10.12.1.3): icmp_seq=4 ttl=64 time=0.033 ms
64 bytes from keydb-0.keydb.default.svc.cluster.local (10.12.1.3): icmp_seq=5 ttl=64 time=0.033 ms

rhzs commented 4 years ago

I have no idea, it works on me too keydb-0.keydb.default.svc.cluster.local

Antiarchitect commented 4 years ago

It is because there is no actual dns resolution. This name is hardcoded into /etc/hosts

rhzs commented 4 years ago

Ok, what's the permanent solution?

Antiarchitect commented 4 years ago

I've found the problem. My service wasn't headless. Please try 0.6.0 (you should delete the previous service as clusterIP field is immutable). Thank you for your bug report it was very helpful. I don't know how it worked before in my environment.

rhzs commented 4 years ago

@Antiarchitect I can confirm, it works. Perhaps you may need to update your docs as well, add how to connect to keydb using this chart and how to upgrade this chart.

Thank you!

rhzs commented 4 years ago

@Antiarchitect I have one PR to improve routing strategy using headless service See https://github.com/Enapter/charts/pull/3/files

Antiarchitect commented 4 years ago

I believe you can avoid namespace as well. Pods from one namespace should be resolved properly. I will investigate if this is a good idea in general but now it looks good to me (because we can avoid cluster DNS suffix). Thank you!

Enapter / charts

KeyDB data won't replicate #2