NICMx / Jool

SIIT and NAT64 for Linux
GNU General Public License v2.0
320 stars 66 forks source link

Wrong MTU in Fragmentation Needed/Packet too Big messages #170

Closed ydahhrk closed 9 years ago

ydahhrk commented 9 years ago

Jool's 3.3 series isn't compensating for the difference between the IPv4/v6 headers when generating the MTU field of 'Fragmentation Needed' and 'Packet too Big' ICMP errors.

This only affects ICMP errors generated at Jool. Translating ICMP errors get their MTU adjusted just fine.

This is supposed to be messing up Path MTU discovery... but if this is the case, it's weird that nobody seems to have tumbled into this problem.

Or maybe it's the actual cause of this comment? (I guess it's a stretch.)

I think I should tag this as critical, so we'll probably release 3.3.4 tomorrow.

toreanderson commented 9 years ago

I'm guessing the prevalence of 1500-byte MTUs and TCP MSS prevents the need for PMTUD. The IPv6 speaker will advertise an TCP MSS of 1440, causing the IPv4 speaker to never send larger frames than 1480 bytes - which fits snugly into an 1500 bytes large IPv6 frame.

I doubt this issue has anything to do with issue #136.

ydahhrk commented 9 years ago

Found the other problem.

SIIT Jool cannot create ICMPv4 errors if IPv4 forwarding is disabled. (the routing table has nothing to do with it.)

Therefore, removing this step from the documentation was a mistake:

sudo sysctl -w net.ipv4.conf.all.forwarding=1

I repeat: IPv4 FORWARDING MUST BE ENABLED.

Ok, looks good. Will hopefully release 3.3.4 tomorrow.

toreanderson commented 9 years ago

Hm. I cannot reproduce that. My SIIT-DC BRs are all running Jool e7cdebc with no forwarding sysctl set for any interface, neither for IPv4 nor for IPv6. They are returning ICMP errors just fine. I quickly verified the following cases:

ydahhrk commented 9 years ago

A little background:

Before calling icmp_send() (the Linux function that handles creating ICMPv4 errors), Jool has to in-route (route4_input()) the offending (original) packet.

Why? I wish I knew. It's odd that by design the kernel prohibits ICMPv4 errors during the whole prerouting chain unless the original packet is explicitly in-routed (example).

route4_input() is a Jool wrapper for the Linux function ip_route_input(). ip_route_input() is a wrapper for ip_route_input_common() (ip_route_input_noref() in newer kernels), which (upon caching not yielding results) calls ip_route_input_slow(). This function is the one that cares about IPv4 forwarding.

Hm. I cannot reproduce that. My SIIT-DC BRs are all running Jool e7cdebc with no forwarding sysctl set for any interface, neither for IPv4 nor for IPv6. They are returning ICMP errors just fine.

Now that you mention it, the forwarding validation yields flat failure in lower kernels, but falls back to assume the packet is to be delivered locally in higher kernels. This assumption code ends up setting a (perhaps) misled, but valid, blackhole-ish destination on the packet which nevertheless prevents icmp_send() from failing.

(I'm assuming you're using one of the "higher" kernels.)

Honestly, I don't know if this is good or bad. In an ideal world, icmp_send() would not even require the packet to have a known destination. (why would it? icmpv6_send() doesn't.) Failing that, Jool would ideally define its own "dummy" destination (thereby removing the need to call route4_input() entirely), but the kernel doesn't seem to export a way to create one the destination constructor requires the official IPv4-destination-options, which are private.

Therefore, relying on the "fake local" destination seems to be the way to go. Until we get out of prerouting, that is. Man, I really need to get that migration started already.

Summary (also tl;dr)

So, from a quick overview of Linux's code, IPv4 forwarding is mandatory in kernels 3.5 and lower. IPv4 forwarding is not needed in kernels 3.6 and higher. I can reliably toggle ICMPv4 errors in Ubuntu 12.04 (kernel 3.2) by tweaking IPv4 forwarding, but nothing happens if I do the same in Ubuntu 14.04 (kernel 3.13).

(After the migration, both IP forwardings will be required anyway due to the cross-interface routing.)

ydahhrk commented 9 years ago

This assumption code ends up setting a (perhaps) misled, but valid, blackhole-ish destination on the packet which nevertheless prevents icmp_send() from failing.

It seems this isn't a coincidence.

The commit where this happens is torvalds/linux@251da4130115b29403a57096fa0988249f31fc55.

The author wanted the destination to be cached, which does happen, but then success is returned and the destiny of the packet is deferred until the ip_error() function as a result or side effect. ip_error() normally happens during the actual legitimate in-routing. (ie. after the prerouting chain.)

So the "fake local" destination is intended as a placeholder so an ICMP error can be built out of the packet later. In other words, it's exactly what Jool wants it for.

It seems this solution is more elegant that I originally thought. (lower kernel crybabying notwithstanding.)