NICMx / Jool

SIIT and NAT64 for Linux
GNU General Public License v2.0
326 stars 66 forks source link

Fragmentation is still unsupported #59

Closed ydahhrk closed 10 years ago

ydahhrk commented 11 years ago

There's no fragmentation-handling code in the module; The NAT64 currently expects the kernel to reassemble fragments. If fragments arrive from the IPv4 side, the kernel will reassemble them into a likely big packet, which will most likely not fit in the outgoing interface's MTU. And because IPv6 routers don't fragment, the packet will be unnecessarily dropped.

magg commented 11 years ago

Maybe you guys should do some testing to see if the kernel handles the fragments for you...

The scenario described in http://packetlife.net/blog/2008/aug/18/path-mtu-discovery/ seems simple enough to test both fragment handling and PMTUD; lowering the MTU on one host does the trick.

techmotive commented 11 years ago

I feel confused about fragmentations. Why can't the big packet be refragmented when transmitted to the IPV6 side? If we do so, what's the problem?

magg commented 11 years ago

In IPv6, the Path MTU discovery is mandatory, therefore, intermediate routers cannot perform IP fragmentation, they will just send an ICMPv6 error message. The only exception is when a router receives an IPv6 packet with a fragment header. The router can re-fragment this packet if needed.

techmotive commented 11 years ago

so you meant that we don't need to do the fragmentation in the 4to6 direction? and the kernel will handle it ?

magg commented 11 years ago

The outgoing IPv6 packets are EXPECTED (right now, I guess) to be fragmented by the kernel functions, with a few adjustments

If the translated IPv4 packet had the DF bit set and the resulting IPv6 packet does not fit in the route to host, an ICMPv4 message is sent back to the IPv4 host, with as information, the MTU the IPv6 link minus 20 (the difference between standard IPv4 and IPv6 headers size).

If the translated IPv4 packet did not set the DF bit (meaning it did not perform PMTUD), the gateway will fragment the packet.

If the DF bit is not set, even though the packet does not need to be fragmented, a Frag header will be added anyway. The existence of this fragment header will tell the routers on the road that they can perform IP fragmentation on this packet.

techmotive commented 11 years ago

thanks a lot! This explanation is quite clear!

ipclouds commented 11 years ago

hi

since you were talking about this in the other thread... would you care to elaborate how are you implementing this?

There are two ways you can enable fragmentation:

Reassemble and translate VS. Translating the fragments

The latter is more efficient I think, because there's no packet buffering in the (common) in-order case.

In 
IPv6‐to‐IPv4 
direction fragments 
can 
be
 translated 
without 
matching 
session 
state 
if
 ID 
values 
are
 kept
 consistent (= 
per‐packet
 translation 
state
 but
 no
 buffering)


ydahhrk commented 11 years ago

Thing is, our current predominant belief is that the former ("Reassemble and translate") is actually not an option because fragmentation works too differently in IPv6. If you want to prove us wrong, you're invited to do so. Here's the full story: https://github.com/NICMx/NAT64/wiki/Quirk:-The-iptables-Conundrum

So yes, we're trying to pull off individual fragment translation. The basic idea, as you've figured out, is to queue fragments until the layer-4 header (fragment offset zero) arrives. When that happens, the NAT64 has enough information to forward all related fragments and store the relevant layer-4 information in a small structure. The small structure will be used to translate future incoming fragments belonging to the same packet. Once the last fragment has been dispatched or the timer runs out, the small structure will be deleted.

I don't really understand your last paragraph, though. What do you mean by "ID values"? I think that we absolutely need to know each fragment's intended transport IDs before we translate because the resulting (translated) packet's source IP address is inferred from the combination of addresses and ports from the original packet (i. e. we use them to find a session, and we copy the session's IPv4 addresses to the outgoing packet).

Fell free to bounce back if you want more info.

ipclouds commented 11 years ago

By ID values I mean the Identification field's value in the Fragment extension header from the IPv6 hosts. With this multiple IPv6 
hosts 
may 
use
 the
 same
 identification values, so maybe another option is to have the 
NAT64
 
generate locally
 IPv4
 identification 
values 
for 
ALL 
IPv4 
packets.

ydahhrk commented 10 years ago

The Working Group from the IETF seems to have concluded that there is little point in having the NAT64 generate identification values because of the corner case scenarios:

"1. There doesn't seem to be a way to solve this well" "2. Solving it "better" incurs significant complexity" "With a service such as google, we may see many orders of magnitude more IPv6 hosts trying to talk to google through the translator such that the incidence of collision goes way up even if the translator (and not the source hosts) is the one who chooses the IDs." (http://ietf.10.n7.nabble.com/NAT64-fragmentation-td220636.html#a220651)

Also, in an effort to remain as standard as possible, we have decided to use an adapted version of the RFC 815 algorithm to correlate fragments. I think this is helpful in handling the problems you're discussing, because its key to find a group of fragments is the set [identification, source address, destination address, transport protocol], not just the identification value.

Again, sorry for taking so long to answer.

ydahhrk commented 10 years ago

The branch has been merged. Closing the issues involved.

ydahhrk commented 10 years ago

If you want to prove us wrong, you're invited to do so. Here's the full story: https://github.com/NICMx/NAT64/wiki/Quirk:-The-iptables-Conundrum

BTW: We've been proved wrong. #104 overrides this.

danehans commented 7 years ago

After compiling Jool with debug, I see a ton of the following messages during my problematic curl -6:

[ 4355.373284] SIIT Jool: ===============================================
[ 4355.373299] SIIT Jool: Catching IPv4 packet: 128.107.241.179->10.138.0.2
[ 4355.373301] SIIT Jool: Translating the Packet.
[ 4355.373309] SIIT Jool: Result: 64:ff9b::806b:f1b3->64:ff9b::a8a:2
[ 4355.373317] SIIT Jool: ip6_route_output() returned error -101. Cannot route packet.
[ 4355.373319] SIIT Jool: Returning the packet to the kernel.
[ 4355.373325] NAT64 Jool: ===============================================

For troubleshooting purposes, I tried adding a static route for the synthetic prefix with the gw of the Docker bridge interface. It caused me to loose connectivity with my GCE VM: $ sudo route -A inet6 add 64:ff9b::/96 dev br-45009954d823

ydahhrk commented 7 years ago

Weren't you experimenting with NAT64 Jool? It seems to me that you modprobed SIIT Jool (jool_siit) by accident.

danehans commented 7 years ago

@ydahhrk I was using NAT64 Jool. I followed the provided document and it states the following:

$ cd Jool/mod
$ make JOOL_FLAGS=-DDEBUG # -- This is the key --
$ sudo make modules_install
$ sudo depmod
$
$ sudo modprobe -r jool_siit
$ sudo modprobe jool_siit pool6=...
$
$ dmesg | tail -5

I will rebuild my test environment and recompile using sudo modprobe -r jool. Please let me know if any other document discrepancies exist. Thank you for the help.

ydahhrk commented 7 years ago

The code shown is an example. You should replace sudo modprobe -r jool_siit and sudo modprobe jool_siit pool6=... depending on what you're doing. If I'm not mistaken, your version of the commands should be sudo modprobe -r jool and sudo modprobe jool pool6=64:ff9b::/96.

By the way: We should probably communicate through mail directly. Github broadcasts these comments, so we're spamming on people's inboxes.

danehans commented 7 years ago

@ydahhrk should I use your personal email or jool-list@nic.mx?

ydahhrk commented 7 years ago

User support is best suited for jool at nic.mx in my opinion.