Grokzen / docker-redis-cluster

Dockerfile for Redis Cluster (redis 3.0+)
MIT License
1.47k stars 550 forks source link

Podman not supported #151

Closed gaeljw closed 1 year ago

gaeljw commented 1 year ago

Environment

Steps to Reproduce

  1. Run the container with podman instead of docker with exposed internal port 7000 to 7006 as 17000 to 17006 externally.
  2. Try to connect to the cluster once it is running properly from any client (Java / Lettuce in my case) using localhost:17000 to 17006 (external address)
  3. Client gets an error timeout on the internal IP

Expected Behavior

Client code should be able to connect to cluster with external address.

Observed Behavior

Error with client trying to reach internal address 10.x.x.x:7000 to 7006.

When running with docker, it works perfectly fine. Thus it might not be related to the image itself but I never encountered similar issues with podman.

I actually would expect it to fail with docker as well as there are no "cluster-announce-ip" settings defined.

Any suggestion of how/what to look into would be appreciated.

I may provide a more precise reproduction scenario if needed.

Grokzen commented 1 year ago

@gaeljw Two things come to mind that you can test.

First is to go inside the container itself and try connect first with telnet (you might have to manually install those tools on the inside first) and try to connect to the local nodes on the internal IP:s. If telnet works, then try to install the redis-cli separate (there is a standalone redis package for this in apt i think) and connect it internally within the container

Secondly is to basically do the same thing from the outside of the container going in. This would at least determine if the cluster itself is broken on the inside, or if this is a network problem.

If you are able to do both of these things, then you most likley have an issue with lettuce not being compatible with my setup. Also note that if you run on a mac there is further issues with IP:port mapping that is already described in the README so podman+mac might be even worse :D

My guess tho, is that your network works fine, and the issue is a common issue with almost all redis-clients. Redis internally maps the IP:port on how it connects and finds it's cluster nodes. This would mean it maps 127.0.0.1:7001 as the first node in the cluster, when your client connects to a cluster it asks CLUSTER INFO and CLUSTER SLOTS and that will give back the internal cluster representation that would be 127.0.0.1:7001. But when running from outside the container you "want" it to be at 17001 and the cluster don't understand (and it should not really handle it). To combat this in my own library redis-py-cluster i added in a remapping feature to the client where users could self define what to replace with what response from the cluster in order to solve this issue in the client where the issue ultimately really is. This is a well known limitation when running all redis cluster nodes in either a single docker container, or even when running multiple nodes on a single VM.

gaeljw commented 1 year ago

@Grokzen what I find super surprising is that the same image works fine with Docker but not with podman.

With podman:

# Running the container
podman run -p 60000:7000 -p 60001:7001 -p 60002:7002 -p 60003:7003 -p 60004:7004 -p 60005:7005 grokzen/redis-cluster:6.2.1

# Connecting to the cluster from the outside on wrong node fails to route to correct node
redis-cli -c -h localhost -p 60001
localhost:60001> get "a"
-> Redirected to slot [15495] located at 10.0.2.100:7002
# timeout...

# Works fine when targeting right node
redis-cli -c -h localhost -p 60002
localhost:60002> GET "a"
(nil)
localhost:60002>

With docker:

# Running the container
docker run -p 60000:7000 -p 60001:7001 -p 60002:7002 -p 60003:7003 -p 60004:7004 -p 60005:7005 grokzen/redis-cluster:6.2.1

# Connecting to the cluster from the outside on wrong node does work
redis-cli -c -h localhost -p 60001
localhost:60001> GET "a"
-> Redirected to slot [15495] located at 172.17.0.2:7002
(nil)
172.17.0.2:7002>

There's "some magic" happening with docker, isn't it?! :sweat_smile:

gaeljw commented 1 year ago

Well.. I guess this is proof of an "issue" with podman (or maybe a specific behaviour of docker).

The internal IP of the container is reachable from the outside when running with docker, but not with podman.

# Telnet on podman container internal IP
telnet 10.0.2.100 7002
Trying 10.0.2.100...
# Timeout...

# Telnet on docker container internal IP
telnet 172.17.0.2 7000                                 
Trying 172.17.0.2...
Connected to 172.17.0.2.
Escape character is '^]'.
Grokzen commented 1 year ago

@gaeljw Yeah this is a common/container issue with redis. It depends on how redis has coded it's IP lookup solution and how that related to your network setup within docker itself. One issue do come from that i bunch them up into one single container because redis in most cases run on a single VM/physical server and then redis cluster protocol has to talk to other machines and then it tracks the external IP outside itself on how to reach other nodes and the redis-cli most of the times sits on one of these networks and can more seamless integrate and connect to each redis server. I have seen this hundres of times and it is the exact same issue every time :)

What i would love for Redis to implement is to support DNS names as addresses when joining up cluster nodes and let the client return DNS names when running CLUSTER INFO / SLOTS commands and let my DNS solve it instead of having to hardcode it within the service and protocol itself. If i just have node1:7000 and node2:7001 as values returned from redis, i can solve and especially remap or map these things to different values and let the network setup and solutions deal with this issue instead of doing it within Redis at all

gaeljw commented 1 year ago

In the end I used bitnami image with a Redis cluster of a single node and using an announce-ip.

For future readers, here's the command I used in the end:

podman run --rm -e ALLOW_EMPTY_PASSWORD=yes -e REDIS_CLUSTER_CREATOR=yes -e REDIS_NODES=my-redis --name my-redis -e REDIS_CLUSTER_ANNOUNCE_IP=127.0.0.1 -e REDIS_CLUSTER_ANNOUNCE_PORT=60000 -e REDIS_CLUSTER_DYNAMIC_IPS=no -p 60000:6379 bitnami/redis-cluster:7.0.8

And issuing following command with redis-cli or any other client so that the single node handle all slots:

CLUSTER ADDSLOTSRANGE 0 1638