bitwalker / libcluster

Automatic cluster formation/healing for Elixir applications
MIT License
1.97k stars 188 forks source link

Connecting to host using ip fails but name works. #129

Closed ghost closed 4 years ago

ghost commented 4 years ago

Using the https://hexdocs.pm/libcluster/Cluster.Strategy.Kubernetes.DNS.html#content strategy I get the ips using :a defaut option. But my node list remains empty. In fact I can connect to the node using their name but not using their ip.

I wonder if this is an issue with erlang or Kubernetes

iex(app@app-7b77bc47b4-qtwlq)1> Node.list
[]
iex(app@app-7b77bc47b4-qtwlq)2> Node.connect(:"app@10.16.5.77")
false
iex(app@app-7b77bc47b4-qtwlq)3> Node.connect(:"app@10.16.4.175")
false
iex(app@app-7b77bc47b4-qtwlq)4> Node.connect(:"app@app-7b77bc47b4-qtwlq")
true
ghost commented 4 years ago

What I ended up doing.

Add a env variable in the kubernetes config

        - name: POD_IP
          valueFrom:
            fieldRef:
              fieldPath: status.podIP

In the env.sh.eex

export RELEASE_DISTRIBUTION=name
export RELEASE_NODE=<%= @release.name %>@$POD_IP
bitwalker commented 4 years ago

As you've probably discovered by now, the nodes themselves have to be started using the name they will be connected with, so the best way to handle this is to have every node use its IP address as the FQDN (e.g. <node>@10.16.5.77), rather than the hostname. In your example, the node must have been started using the hostname, which is why connecting via IP failed.

Put another way, if your node is named a@b with IP 192.168.0.10, then connecting to that node with a@192.168.0.10 won't work, it requires connecting with a@b, which is why it is critical to use names which properly resolve from other nodes on the network.