NICMx / Jool

SIIT and NAT64 for Linux
GNU General Public License v2.0
320 stars 66 forks source link

Jool crashes when used inside a VM (Jool 4.0.0) #279

Closed rfloriot closed 4 years ago

rfloriot commented 5 years ago

Hello,

Jool works well on a physical device but I can't make it work inside a VM, it crashes. In both cases, I use Jool 4.0.0 with Ubuntu 18.04 4.15.0-45-generic

In both cases also I configure An IPv4 (let's say X.X.X.X), an IPv6 (Y::1/64 for example) for the NAT64 and one /24 IPv4 pool (let's say Y.Y.Y.0/24 here)

The command used are the following:

$ sudo /sbin/modprobe jool $ sudo jool instance add UCL --iptables --pool6 64:ff9b::/96 $ sudo jool -i UCL pool4 add --tcp Y.Y.Y.0/24 10000-14000 $ sudo jool -i UCL pool4 add --udp Y.Y.Y.0/24 10000-14000 $ sudo jool -i UCL pool4 add --icmp Y.Y.Y.0/24 10000-14000

$ sudo ip6tables -t mangle -A PREROUTING --destination 64:ff9b::/96 -j JOOL --instance UCL $ sudo iptables -t mangle -A PREROUTING --destination Y.Y.Y.0/24 -p tcp --dport 10000:14000 -j JOOL --instance UCL $ sudo iptables -t mangle -A PREROUTING --destination Y.Y.Y.0/24 -p udp --dport 10000:14000 -j JOOL --instance UCL $ sudo iptables -t mangle -A PREROUTING --destination Y.Y.Y.0/24 -p icmp -j JOOL --instance UCL

All command are accepted in both cases, but in the case of the VM, the whole VM crashes as soon as the NAT64 receives its first client to serve (for instance when I ping 64:ff9b::1 from a client device).

The debug messages inside Jool does not show anything special as the VM stops roughly.

From the hypervisor, I can find some error messages related to this specific VM like "BUG: unable to handle kernel NULL pointer dereference" "Kernel panic - not syncing: Fatal exception in interrupt" "Unexpected reschedule of offline CPU#0"

You can find the full logs here : https://www.dropbox.com/s/ybj4d9arrklq04c/log.txt?dl=1

Thank you for your help

Rémi Floriot

ydahhrk commented 5 years ago

I'm not saying that this is not Jool's fault (it likely is), but Jool is actually not the one crashing.

According to the stack trace, it's crashing on NAT code. Is your VM running a NAT? I'm asking because I'm guessing you probably didn't intend that, and the quickest workaround for now is to just remove it.

(I have run the code several times on Ubuntu 18.04 VMs, and never had any problems, so I don't think that the problem is the VM per se.)

Thank you for your report. I'll take a deeper look now.

rfloriot commented 5 years ago

Hello,

thank you for your message. Our VM isn't running NAT. iptables are empty concerning nat. The Hypervisor is KVM centos7. The virtual network interface is "virtio".

I tried without success to blacklist the NAT modules for the test but it is still loaded at boot.

I'll try tomorrow with another os like debian9 inside the VM.

Have a nice day,

ydahhrk commented 5 years ago

Can't reproduce :/

Could you export that VM to .ovf or .ova and send it to me so I can hammer it?

rfloriot commented 5 years ago

Hello, its a bit sensitive to share this vm, sorry we prefer not to. I tried with debian9 with the same result. Also changing the network interface do not help. I tried inside a simple virtualbox but I did not had the problem there (but I was there using directly connected interfaces to host rather than default route towards a router). (We use opennebula and kvm in production)

We continue our tests for IPv6 only Wi-Fi with a physical device for the NAT64 so for now this is not blocking.

ydahhrk commented 5 years ago

Ok.

Can't look into this further right now, but I'll try to allocate some time next week.

rfloriot commented 5 years ago

Hello, I made some other tests inside Virtualbox today and it works well. Here is the topology I built:

topology_nat64

PC2 is the NAT64 server

and PC1 is able to ping the 64:ff9b+PC3 address and also to curl its services. The crash we encounter seems for me thus more linked to OpenNebula/KVM or some specific emulated hardware there.

rfloriot commented 5 years ago

Ok found ! I had some lines related to NAT inside the rules.v4 iptables file (the file that is imported to populate iptables).

I had to delete those lines:

*nat
:PREROUTING ACCEPT [0:0]
:INPUT ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
:POSTROUTING ACCEPT [0:0]
COMMIT

And all works well inside the VM.

Have a nice day

ydahhrk commented 5 years ago

Sorry I couldn't help much :/

ydahhrk commented 5 years ago

Actually, even if you found a workaround for it, there is no reason why it should crash when there are strange NAT rules around. It suggests that there is some problem elsewhere waiting to show its symptoms again.

Thank you for all your efforts and sorry for the trouble. But as for me, I shouldn't drop this until the root of the problem is known.

I will reopen this and re-prioritize.

ydahhrk commented 4 years ago

Confirmed; this and #289 are the same bug. This has been fixed since 4.0.5.