NICMx / Jool

SIIT and NAT64 for Linux
GNU General Public License v2.0
326 stars 66 forks source link

joold: exits with errors 2, -2 and -3 and never starts #309

Closed telmich closed 4 years ago

telmich commented 4 years ago

When testing joold this weekend at Hack4Glarus, it was impossible to start it with any configuration file. I documented the various test cases on https://redmine.ungleich.ch/issues/7377, but the basic result is that even with the following setup, it does not start:

[14:33] replacement-router2.place5:~# joold 
joold error: -2
[14:33] replacement-router2.place5:~# cat netsocket.json 
{
        "multicast address": "FF02::DB8::1",
        "multicast port": "6464" 
}

I am using jool on alpine with the following versions:

[17:10] replacement-router1.place5:~# apk list -I | grep jool
jool-tools-bash-completion-4.0.6-r0 x86_64 {jool-tools} (GPL-2.0-only) [installed]
joold-4.0.6-r0 x86_64 {jool-tools} (GPL-2.0-only) [installed]
jool-modules-vanilla-4.19.80-r0 x86_64 {jool-modules-vanilla} (GPL-2.0-or-later) [installed]
jool-tools-openrc-4.0.6-r3 x86_64 {jool-tools} (GPL-2.0-only) [installed]
jool-tools-4.0.6-r0 x86_64 {jool-tools} (GPL-2.0-only) [installed]
[17:10] replacement-router1.place5:~# 
ydahhrk commented 4 years ago

ARGH

My bad. I forgot that joold is supposed to pour output on syslog, so the error messages you're getting are actually afterthoughts.

$ tail -3 /var/log/syslog
Dec  2 10:22:39 Ishtaros joold: Opening file netsocket.json...
Dec  2 10:22:39 Ishtaros joold: Getting address info of FF02::DB8::1#6464...
Dec  2 10:22:39 Ishtaros joold: getaddrinfo() failed: Name or service not known

The problem at this point is that FF02::DB8::1 is not a valid IPv6 address. If I change that to ~FF02::DB8:1~ FF02:DB8::1, I get

Dec  2 10:10:32 Ishtaros joold: Opening file netsocket.json...
Dec  2 10:10:32 Ishtaros joold: Getting address info of FF02:DB8::1#6464...
Dec  2 10:10:32 Ishtaros joold: Trying an address candidate...
Dec  2 10:10:32 Ishtaros joold: bind() failed: Invalid argument
Dec  2 10:10:32 Ishtaros joold: None of the candidates yielded a valid socket.

There's probably something wrong with the ff02:db8:: prefix. Maybe the scope d is not defined, or perhaps it's complaining that the reserved component (8) is nonzero (see the General multicast address format (new)). Or it can't use link-local for some reason. Unfortunately, there's not much I can do to improve this particular output, because Linux is the one throwing the error.

I don't really remember the multicast address format nuances, but I tried the one that comes in the tutorial (ff08::db8:64:64), and I can see some progress:

Dec  2 11:20:27 Ishtaros joold: Opening file ./netsocket.json...
Dec  2 11:20:27 Ishtaros joold: Getting address info of FF08::DB8:64:64#6464...
Dec  2 11:20:27 Ishtaros joold: Trying an address candidate...
Dec  2 11:20:27 Ishtaros joold: The socket to the network was created.
Dec  2 11:20:27 Ishtaros joold: Configuring multicast options on the socket...
Dec  2 11:20:27 Ishtaros joold: We're now registered to the multicast group.
Dec  2 11:20:27 Ishtaros joold: Multicast loopback disabled.
Dec  2 11:20:27 Ishtaros joold: Jool's socket family doesn't seem to exist.#012(This probably means Jool hasn't been modprobed.)#012Netlink error message: Object not found

From here I can modprobe, configure the module and keep following the tutorial.

Don't close this bug yet; I want Jool to print a message telling the user to check syslog so this never happens again.

ydahhrk commented 4 years ago

BTW: I really don't recommend active-active, since there is no locking. YMMV

ydahhrk commented 4 years ago

Ok, please close this if this seems good enough. Otherwise we can keep debugging.

ydahhrk commented 4 years ago

Jool 4.0.7 now prints a message in standard output reminding the user that the logs are sent to syslog.

In addition, from our mail exchange elsewhere it seems the problem has been resolved, so closing.

Feel free to reopen if a problem persists.