Open andywhite37 opened 7 months ago
Hi, node being unavailable (i.e. unavailableNodes) means it failed net_adm:ping
.
i.e.
net_adm:ping('mongooseim@mongooseim-0.mongooseim.qwick-chat.svc.cluster.local').
pang
What should you check?
mongooseim-0.mongooseim.qwick-chat.svc.cluster.local
on mongooseim@mongooseim-1.mongooseim.qwick-chat.svc.cluster.local
. Ping6 works, but erlang inet module could use its own logic :) Oh, and there is resolver logic in erlang too:
inet:gethostbyname('google.com', inet6).
{ok,{hostent,"google.com",[],inet6,16,
[{10752,5200,16411,2062,0,0,0,8206}]}}
To debug deeper we would need to figure out how to configure docker desktop for k8s with ipv6 only. Or the same but on Circle CI ;)
Hi @andywhite37. I can confirm that the inet6_tcp
option is supported. You can check it with the following:
rel/files/vm.dist.args
, adding -proto_dist inet6_tcp
.cluster_commands_SUITE
, which checks clustering with Mnesia. You can check CETS clustering as well (I checked it and it worked as well): ./tools/test-runner.sh --skip-small-tests --db redis pgsql --preset pgsql_mnesia --skip-cover --skip-stop-nodes -- cluster_commands
(this command needs Docker to start postgres
and redis
containers). All tests in this suite passed for me locally. I could connect the nodes manually, and with TLS as well.The difference to your setup seems to lie in the DNS resolution, as @arcusfelis suggested.
I think I'd ask you to do some debugging on your side. Run mongooseimctl debug
on one of your nodes. Then, in the Erlang shell, try to do the following:
inet:gethostname().
net_adm:names().
Please provide the results. Could you also tell me what hostname
returns (without -f
) and what it resolves to? My first guess would be that it's not possible to reach epmd
.
MongooseIM version: 6.1.0 Installed from: Docker Erlang/OTP version: version packaged with MongooseIM 6.1.0
I posted a previous issue #4127 about ipv6 support with
mongooseimctl
, but I'm feeling like the problem runs deeper. I have the servers starting up and connecting to an RDBMS correctly, and I have been able to exchange messages with the server using an XMPP client (Adium). I've tried exercising the XMPP port (5222), WebSockets, GraphQL, and all of that seems to be working fine.I have been struggling mightily to get MongooseIM clustering in an ipv6-based network in Kubernetes, both with
mnesia
and the newcets
support. I'm unfortunately not an Erlang developer, so I've been doing a lot of reading. My research has led me to adding-proto_dist inet6_tcp
in thevm.args
/vm.dist.args
, but I haven't had much luck with this.This is what I currently have in
vm.dist.args
(I actually have these lines duplicated invm.args
too, just in case there are contexts that only use one or other of the files):When I inspect the tcp listeners on the containers, I see
epmd
listening on port4369
on both the ipv4 and ipv6 interfaces. However, when the listener is started on port9100
, it's only on the ipv4 interface, and not ipv6.When it's running this way, when I run
mongooseimctl
I get thenodedown
error, I believe because my hostnames resolve to ipv6 addresses, so they want to connect to ports 9100-9110, but on the ipv6 address, rather than ipv4.As an experiment, we tried running
socat -dd TCP-LISTEN:9100,ipv6only,fork TCP4:127.0.0.1:9100
to set up an ipv6 listener to forward to the ipv4 address on the same port (we did this for all the ports 9100-9110), and that actually allowsmongooseimctl
to work and I can run commands, but it doesn't seem like this workaround works formnesia
andcets
for clustering.My suspicion is that
-proto_dist inet6_tcp
is not being respected somewhere (because whatever is starting the listener on9100
is still just using ipv4), or some networking code is not using ipv6-compatible TCP options somewhere. I've looked through a lot of code inMongooseIM
andcets
for clues, but I don't have the background in erlang distribution/networking to know exactly where to look or what to look for.ping6
/telnet
/nslookup
each of the FQDNs from eachother.mnesia
, which I domongooseimctl mnesia info
, each node only lists itself in the running db nodescets
using a newer docker image, and when I domongooseimctl cets systemInfo
, both of the nodes show up indiscoveredNodes
, but each node shows the other inunavailableNodes
, which I believe means they are not able to ping eachother. With mysocat
workaround in place, I can successfully ping each node usingmongooseim ping
mongooseimctl cets systemInfo
output:Questions
-proto_dist inet6_tcp
tested/known to work or not work with MongooseIM?