Closed rfloriot closed 4 years ago
I'm not saying that this is not Jool's fault (it likely is), but Jool is actually not the one crashing.
According to the stack trace, it's crashing on NAT code. Is your VM running a NAT? I'm asking because I'm guessing you probably didn't intend that, and the quickest workaround for now is to just remove it.
(I have run the code several times on Ubuntu 18.04 VMs, and never had any problems, so I don't think that the problem is the VM per se.)
Thank you for your report. I'll take a deeper look now.
Hello,
thank you for your message. Our VM isn't running NAT. iptables are empty concerning nat. The Hypervisor is KVM centos7. The virtual network interface is "virtio".
I tried without success to blacklist the NAT modules for the test but it is still loaded at boot.
I'll try tomorrow with another os like debian9 inside the VM.
Have a nice day,
Can't reproduce :/
Could you export that VM to .ovf
or .ova
and send it to me so I can hammer it?
Hello, its a bit sensitive to share this vm, sorry we prefer not to. I tried with debian9 with the same result. Also changing the network interface do not help. I tried inside a simple virtualbox but I did not had the problem there (but I was there using directly connected interfaces to host rather than default route towards a router). (We use opennebula and kvm in production)
We continue our tests for IPv6 only Wi-Fi with a physical device for the NAT64 so for now this is not blocking.
Ok.
Can't look into this further right now, but I'll try to allocate some time next week.
Hello, I made some other tests inside Virtualbox today and it works well. Here is the topology I built:
PC2 is the NAT64 server
and PC1 is able to ping the 64:ff9b+PC3 address and also to curl its services. The crash we encounter seems for me thus more linked to OpenNebula/KVM or some specific emulated hardware there.
Ok found ! I had some lines related to NAT inside the rules.v4 iptables file (the file that is imported to populate iptables).
I had to delete those lines:
*nat
:PREROUTING ACCEPT [0:0]
:INPUT ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
:POSTROUTING ACCEPT [0:0]
COMMIT
And all works well inside the VM.
Have a nice day
Sorry I couldn't help much :/
Actually, even if you found a workaround for it, there is no reason why it should crash when there are strange NAT rules around. It suggests that there is some problem elsewhere waiting to show its symptoms again.
Thank you for all your efforts and sorry for the trouble. But as for me, I shouldn't drop this until the root of the problem is known.
I will reopen this and re-prioritize.
Confirmed; this and #289 are the same bug. This has been fixed since 4.0.5.
Hello,
Jool works well on a physical device but I can't make it work inside a VM, it crashes. In both cases, I use Jool 4.0.0 with Ubuntu 18.04 4.15.0-45-generic
In both cases also I configure An IPv4 (let's say X.X.X.X), an IPv6 (Y::1/64 for example) for the NAT64 and one /24 IPv4 pool (let's say Y.Y.Y.0/24 here)
The command used are the following:
$ sudo /sbin/modprobe jool
$ sudo jool instance add UCL --iptables --pool6 64:ff9b::/96
$ sudo jool -i UCL pool4 add --tcp Y.Y.Y.0/24 10000-14000
$ sudo jool -i UCL pool4 add --udp Y.Y.Y.0/24 10000-14000
$ sudo jool -i UCL pool4 add --icmp Y.Y.Y.0/24 10000-14000
$ sudo ip6tables -t mangle -A PREROUTING --destination 64:ff9b::/96 -j JOOL --instance UCL
$ sudo iptables -t mangle -A PREROUTING --destination Y.Y.Y.0/24 -p tcp --dport 10000:14000 -j JOOL --instance UCL
$ sudo iptables -t mangle -A PREROUTING --destination Y.Y.Y.0/24 -p udp --dport 10000:14000 -j JOOL --instance UCL
$ sudo iptables -t mangle -A PREROUTING --destination Y.Y.Y.0/24 -p icmp -j JOOL --instance UCL
All command are accepted in both cases, but in the case of the VM, the whole VM crashes as soon as the NAT64 receives its first client to serve (for instance when I ping 64:ff9b::1 from a client device).
The debug messages inside Jool does not show anything special as the VM stops roughly.
From the hypervisor, I can find some error messages related to this specific VM like "BUG: unable to handle kernel NULL pointer dereference" "Kernel panic - not syncing: Fatal exception in interrupt" "Unexpected reschedule of offline CPU#0"
You can find the full logs here : https://www.dropbox.com/s/ybj4d9arrklq04c/log.txt?dl=1
Thank you for your help
Rémi Floriot