NICMx / Jool

SIIT and NAT64 for Linux
GNU General Public License v2.0
320 stars 66 forks source link

Wrong IPv6 source address when translating ICMPv4 errors in stateful mode #132

Closed toreanderson closed 9 years ago

toreanderson commented 9 years ago

It appears that when tracerouting from an IPv6 node out through a Jool in Stateful NAT64 mode, the source address of ICMPv4 errors is translated to the original IPv6 destination address, rather than the IPv4-converted IPv6 address of the IPv4 router. This fools certain traceroute programs to believe that the traceroute succeeds, since they end up seeing replies that appears to come from the target of the traceroute. For example, below you can see a traceroute towards the IPv4 address 254.254.254.254 (which obviously don't realy respond to anything, since it's a martian).

$ mtr -n -c 1 -r --report-wide 2a02:c0::46:42:254.254.254.254
Start: Mon Mar  9 10:45:28 2015
HOST: echo.ms.redpill-linpro.com        Loss%   Snt   Last   Avg  Best  Wrst StDev
  1.|-- 2a02:c0:2:4:6666:17:0:1            0.0%     1    1.3   1.3   1.3   1.3   0.0
  2.|-- 2a02:c0:2:4::                      0.0%     1    1.9   1.9   1.9   1.9   0.0
  3.|-- 2a02:c0:1:401::13                  0.0%     1    1.5   1.5   1.5   1.5   0.0
  4.|-- 2a02:c0:400:104:218:59ff:fe19:403  0.0%     1    1.4   1.4   1.4   1.4   0.0
  5.|-- 2a02:c0::46:42:fefe:fefe           0.0%     1    1.4   1.4   1.4   1.4   0.0
$ 

A tcpdump running on the Jool node shows what's going on:

10:45:20.814229 IP (tos 0xc0, ttl 64, id 8441, offset 0, flags [none], proto ICMP (1), length 112)
    185.47.41.1 > 185.47.42.1: ICMP time exceeded in-transit, length 92
    IP (tos 0x0, ttl 1, id 18041, offset 0, flags [none], proto ICMP (1), length 84)
    185.47.42.1 > 254.254.254.254: ICMP echo request, id 36874, seq 60544, length 64
10:45:20.814253 IP6 (class 0xc0, hlim 63, next-header ICMPv6 (58) payload length: 112) 2a02:c0::46:42:fefe:fefe > 2a02:c0:2:4:6666:17:0:1001: [icmp6 sum ok] ICMP6, time exceeded in-transit for 2a02:c0::46:42:fefe:fefe

So the source address of the original ICMPv4 error 185.47.41.1 ended up being translated to 2a02:c0::46:42:254.254.254.254 in the resulting ICMPv6 packet. I believe it ought to have been translated to 2a02:c0::46:42:185.47.41.1 instead.

This breaks traceroute through the Stateful NAT64 for all valid destinations, too. Some traceroute programs, like MTR, doesn't show any hops beyond the Jool node, because it gets a response from the original target of the traceroute, as shown here towards Google's public DNS service (the Jool node is kvmtest.i.bitbit.net):

$ mtr -c 1 -r --report-wide 2a02:c0::46:42:8.8.8.8
Start: Mon Mar  9 10:57:12 2015
HOST: echo.ms.redpill-linpro.com      Loss%   Snt   Last   Avg  Best  Wrst StDev
  1.|-- 2a02:c0:2:4:6666:17:0:1          0.0%     1    1.2   1.2   1.2   1.2   0.0
  2.|-- ge-1-0-39.cs1-osl3.n.bitbit.net  0.0%     1    2.5   2.5   2.5   2.5   0.0
  3.|-- bond0.fw2-osl3.n.bitbit.net      0.0%     1    1.1   1.1   1.1   1.1   0.0
  4.|-- kvmtest.i.bitbit.net             0.0%     1    1.4   1.4   1.4   1.4   0.0
  5.|-- 2a02:c0::46:42:808:808           0.0%     1    1.5   1.5   1.5   1.5   0.0
$ 

Regular traceroute fares better, as it appears to continue decrementing the TTL when receiving ICMPv6 Time Exceeded. However all the IPv4 hops in the path are represented using the target IPv6 address, rather than their IPv4-converted IPv6 address per RFC6052:

$ sudo traceroute6 -I 2a02:c0::46:42:8.8.8.8
traceroute to 2a02:c0::46:42:8.8.8.8 (2a02:c0::46:42:808:808), 30 hops max, 80 byte packets
 1  2a02:c0:2:4:6666:17:0:1 (2a02:c0:2:4:6666:17:0:1)  2.392 ms  2.399 ms  2.413 ms
 2  ge-1-0-39.cs1-osl3.n.bitbit.net (2a02:c0:2:4::)  2.740 ms  2.822 ms  2.856 ms
 3  bond0.fw2-osl3.n.bitbit.net (2a02:c0:1:401::13)  2.503 ms  2.529 ms  2.564 ms
 4  kvmtest.i.bitbit.net (2a02:c0:400:104:218:59ff:fe19:403)  2.580 ms  2.613 ms  2.758 ms
 5  2a02:c0::46:42:808:808 (2a02:c0::46:42:808:808)  2.986 ms  3.011 ms  3.026 ms
 6  2a02:c0::46:42:808:808 (2a02:c0::46:42:808:808)  3.458 ms  2.088 ms  2.104 ms
 7  2a02:c0::46:42:808:808 (2a02:c0::46:42:808:808)  1.360 ms  1.388 ms  1.527 ms
 8  2a02:c0::46:42:808:808 (2a02:c0::46:42:808:808)  1.546 ms  1.716 ms  1.898 ms
 9  2a02:c0::46:42:808:808 (2a02:c0::46:42:808:808)  8.424 ms  8.432 ms  8.429 ms
10  2a02:c0::46:42:808:808 (2a02:c0::46:42:808:808)  8.488 ms  8.284 ms  8.313 ms
11  2a02:c0::46:42:808:808 (2a02:c0::46:42:808:808)  8.803 ms  9.084 ms  9.414 ms
12  2a02:c0::46:42:808:808 (2a02:c0::46:42:808:808)  8.937 ms  8.993 ms  9.047 ms

For comparsion, here's another MTR to Google's public DNS through another implementation of Stateful NAT64. It contains plenty of IPv4-converted IPv6 addresses, as I expect it to.

$ mtr -c 1 -r --report-wide 2a02:c0::64:0:8.8.8.8
Start: Mon Mar  9 11:01:12 2015
HOST: echo.ms.redpill-linpro.com       Loss%   Snt   Last   Avg  Best  Wrst StDev
  1.|-- 2a02:c0:2:4:6666:17:0:1           0.0%     1    1.2   1.2   1.2   1.2   0.0
  2.|-- ge-1-0-39.cs1-osl3.n.bitbit.net   0.0%     1    1.7   1.7   1.7   1.7   0.0
  3.|-- xe-1-3-0-0.cr1-osl2.n.bitbit.net  0.0%     1    1.3   1.3   1.3   1.3   0.0
  4.|-- vlan-9.cs1-osl2.n.bitbit.net      0.0%     1    2.4   2.4   2.4   2.4   0.0
  5.|-- nat64gw1-osl2.n.bitbit.net        0.0%     1    1.4   1.4   1.4   1.4   0.0
  6.|-- 2a02:c0::64:0:57ee:210f           0.0%     1    1.8   1.8   1.8   1.8   0.0
  7.|-- 2a02:c0::64:0:c000:200            0.0%     1    1.8   1.8   1.8   1.8   0.0
  8.|-- 2a02:c0::64:0:57ee:3e58           0.0%     1    3.2   3.2   3.2   3.2   0.0
  9.|-- 2a02:c0::64:0:57ee:3e54           0.0%     1    3.9   3.9   3.9   3.9   0.0
 10.|-- 2a02:c0::64:0:40d2:4555           0.0%     1    1.9   1.9   1.9   1.9   0.0
 11.|-- 2a02:c0::64:0:4310:8806           0.0%     1   15.0  15.0  15.0  15.0   0.0
 12.|-- 2a02:c0::64:0:480e:f336           0.0%     1    9.3   9.3   9.3   9.3   0.0
 13.|-- 2a02:c0::64:0:480e:efef           0.0%     1   10.7  10.7  10.7  10.7   0.0
 14.|-- 2a02:c0::64:0:808:808             0.0%     1    9.4   9.4   9.4   9.4   0.0
ydahhrk commented 9 years ago

I guess "non-critical" is debatable.

This looks important and easy, so I'll push a commit tomorrow probably, and advance master right away.

toreanderson commented 9 years ago

I don't think it's critical, because it only breaks (some) traceroute programs. ICMPv4 errors from any of the IPv4 routers (such as Frag Needed) would make it back to the IPv6 client. They would have an odd IPv6 source address, but the source address of ICMP errors aren't important, as long as they're not bogon/martian and thus at risk of being dropped.

toreanderson commented 9 years ago

I don't think b00dcf4 fixes this completely. Here's tshark output (edited for brevity) of an ICMP traceroute to Google Public DNS through Jool:

Frame 1 - original ICMPv6 ping packet sent from the IPv6 source host:

Internet Protocol Version 6, Src: 2a02:c0:2:4:6666:17:0:1001 (2a02:c0:2:4:6666:17:0:1001), Dst: 2a02:c0::46:43:808:808 (2a02:c0::46:43:808:808)
    Next header: ICMPv6 (58)
    Hop limit: 2
    Source: 2a02:c0:2:4:6666:17:0:1001 (2a02:c0:2:4:6666:17:0:1001)
    Destination: 2a02:c0::46:43:808:808 (2a02:c0::46:43:808:808)
Internet Control Message Protocol v6
    Type: Echo (ping) request (128)
    Code: 0

Frame 2 - Jool's IPv6->IPv4 translation of frame 1:

Internet Protocol Version 4, Src: 185.47.43.123 (185.47.43.123), Dst: 8.8.8.8 (8.8.8.8)
    Time to live: 1
    Protocol: ICMP (1)
    Source: 185.47.43.123 (185.47.43.123)
    Destination: 8.8.8.8 (8.8.8.8)
Internet Control Message Protocol
    Type: 8 (Echo (ping) request)
    Code: 0

Frame 3 - an TTL exceeded error originated by an IPv4 router in response to frame 2:

Internet Protocol Version 4, Src: 185.47.41.1 (185.47.41.1), Dst: 185.47.43.123 (185.47.43.123)
    Time to live: 64
    Protocol: ICMP (1)
    Source: 185.47.41.1 (185.47.41.1)
    Destination: 185.47.43.123 (185.47.43.123)
Internet Control Message Protocol
    Type: 11 (Time-to-live exceeded)
    Code: 0 (Time to live exceeded in transit)
    Internet Protocol Version 4, Src: 185.47.43.123 (185.47.43.123), Dst: 8.8.8.8 (8.8.8.8)
        Time to live: 1
        Protocol: ICMP (1)
        Source: 185.47.43.123 (185.47.43.123)
        Destination: 8.8.8.8 (8.8.8.8)
    Internet Control Message Protocol
        Type: 8 (Echo (ping) request)
        Code: 0

Frame 4 - Jool's IPv4->IPv6 translation of frame 3:

Internet Protocol Version 6, Src: 2a02:c0::46:43:b92f:2901 (2a02:c0::46:43:b92f:2901), Dst: 2a02:c0:2:4:6666:17:0:1001 (2a02:c0:2:4:6666:17:0:1001)
    Next header: ICMPv6 (58)
    Hop limit: 63
    Source: 2a02:c0::46:43:b92f:2901 (2a02:c0::46:43:b92f:2901)
    Destination: 2a02:c0:2:4:6666:17:0:1001 (2a02:c0:2:4:6666:17:0:1001)
Internet Control Message Protocol v6
    Type: Time Exceeded (3)
    Code: 0 (hop limit exceeded in transit)
    Internet Protocol Version 6, Src: 2a02:c0:2:4:6666:17:0:1001 (2a02:c0:2:4:6666:17:0:1001), Dst: 2a02:c0::46:43:b92f:2901 (2a02:c0::46:43:b92f:2901)
        Next header: ICMPv6 (58)
        Hop limit: 1
        Source: 2a02:c0:2:4:6666:17:0:1001 (2a02:c0:2:4:6666:17:0:1001)
        Destination: 2a02:c0::46:43:b92f:2901 (2a02:c0::46:43:b92f:2901)
    Internet Control Message Protocol v6
        Type: Echo (ping) request (128)
        Code: 0

What seems wrong here is that the destination IPv4 address 8.8.8.8 embedded in frame 3's ICMPv4 payload is translated to a destination address of 2a02:c0::46:43:b92f:2901 in frame 4's ICMPv6 payload. I believe it should have been translated to 2a02:c0::46:43:808:808 instead.

For what it's worth, because of this mismatch, frame 4 gets dropped by a stateful IP6Tables firewall before making it back to the IPv6 client, so the traceroute output shows no responses for any hops beyond the Jool NAT64 (except for the final "hop", i.e., the destination).

ydahhrk commented 9 years ago

Update:

From my last batch of mails with Tore, we generally agree that the behaviour being complained about here (which is Jool 3.3 and 3.2 behaviour) is in line with RFC 6146. This would make this bug a "won't fix" kind of scenario, but truth be told, the choice of source address could be improved without too much trouble, and we would like to see the IETF's opinion on this.

Therefore, this bug will remain open for the moment. As Tore's last comment argues, commit b00dcf4 was actually a mistake, and therefore it was left out of Jool 3.3.1 on purpose.

toreanderson commented 9 years ago

I just wrote a lengthy e-mail to the RFC6146 authors and the IETF v6ops working group about the issue.

Whether or not this will be regarded as a bug in RFC6146 itself or not remains to be seen, but it would in any case be nice to have to a user-configurable "not-rfc6146-compliant-but-sane" mode of operation that ensured that the source address of a translated ICMPv6 error is generated by applying the RFC6052 algorithm to the source address of the original ICMPv4 error.

ydahhrk commented 9 years ago

It seems we still don't have much of an answer, so this ended up as an alternate configuration mode, rather than the de facto implementation. Here's the doc: https://jool.mx/en/usr-flags-global.html#source-icmpv6-errors-better