NICMx / Jool

SIIT and NAT64 for Linux
GNU General Public License v2.0
320 stars 66 forks source link

stateful NAT64 on single machine doesn't work #297

Closed Dieken closed 4 years ago

Dieken commented 4 years ago

I want to use a dual stack machine on Internet to NAT64 traffic to IPv4 services:

[IPv6 internet] ----> [IPv4/IPv6 box] ---NAT64---> [IPv4 services]

Suppose the box has IP address J:K:M:N::a.b.c.d/96,I hope public traffic to this IP is diverted to IPv4 address a.b.c.d, below is my configuration:

# Although the box has only single interface eth0, I set these explicitly.
sysctl -w net.ipv4.conf.all.forwarding=1
sysctl -w net.ipv6.conf.all.forwarding=1

modprobe jool

jool instance add example --iptables --pool6 J:K:M:N::/96

ip6tables -t mangle -A PREROUTING -d J:K:M:N::/96 -j JOOL --instance example
iptables -t mangle -A PREROUTING -d eth0-IPv4-addr -p tcp --dport 61001:65535 -j JOOL --instance example
iptables -t mangle -A PREROUTING -d eth0-IPv4-addr -p udp --dport 61001:65535 -j JOOL --instance example
iptables -t mangle -A PREROUTING -d eth0-IPv4-addr -p icmp -j JOOL --instance example

Unluckily ping6 J:K:M:N::a.b.c.d just hangs, no output at all.

I followed https://www.jool.mx/en/single-interface.html but still got no luck:

modprobe jool
jool instance add --netfilter --pool6 J:K:M:N::/96
ping6 J:K:M:N::a.b.c.d
ydahhrk commented 4 years ago

I think your problem is simply that you're trying to execute your ping in the [IPv4/IPv6 box] machine, even though your Jool iptables rules are only hooked to PREROUTING.

PREROUTING only happens when [IPv4/IPv6 box] receives a packet from a network interface. The ping you're issuing is not seen as a packet received from a network interface; its source is the same node. Therefore, Jool never touches it.

See the Basic Netfilter/iptables chains diagram. Jool is hooked to PREROUTING, but your ping request starts from "Network applications," travels to OUTPUT, then to POSTROUTING, and then to the network. And that's it.

To test Jool properly, you have to issue the ping from any node in [IPv6 internet] instead.


For the record: I don't know if Jool does the right thing if you add rules to anything other than PREROUTING.

Several years ago, I once tried to hook Jool to OUTPUT to make this work, but ran into some trouble for reasons I don't remember. Maybe it's worth a fresh analysis, but it would need to be prioritized. (ie. new Survey.)

Edit:

If you really want your translator machine to translate its own traffic, you can use network namespaces.

Dieken commented 4 years ago

@ydahhrk I tried ping from another machine, it did work! Thank you very much!

PREROUTING is enough, I don't mind local traffic can't be translated :-)

One annoying thing is ping SOME-IPv4-ADDRESS in the [IPv4/IPv6 box] is broken by the rule iptables -t mangle -A PREROUTING -d eth0-IPv4-addr -p icmp -j JOOL --instance example, the ICMPv4 replies are dropped by JOOL, seems there is no way to distinguish ICMP replies for local box from ICMP replies for NAT-ed client boxes :-(

ydahhrk commented 4 years ago

Indeed.

I think there are two possible solutions for this:

Option A

If this value does what I think it does, it should be possible to abort the Jool iptables rule even when the packet matched. (This would return the packet to the kernel, which would handle the ping normally.)

Option B

Create Jool iptables matches. Something like

ip6tables -t mangle -A PREROUTING -m jool -j JOOL --instance "example"
iptables  -t mangle -A PREROUTING -m jool -j JOOL --instance "example"

-m jool in IPv6 would match packets whose destination address match the --instance instance's pool6. -m jool in IPv4 would match packets whose addresses match some existing BIB entry in the --instance instance. This way, a ping which did not start in the IPv6 side would not match the rule. (Because it would most likely lack a BIB entry.)

Option B would have the added benefit of reducing redundant configuration (as you would no longer need to specify multiple ip(v4)tables rules, and you would no longer need to repeat addresses and ports (-s, -d, --dport) already defined in Jool's pool4 and pool6).

Edit: On the other hand, Option A might perform somewhat faster due to locking constraints.

Do you prefer one of these options?

Dieken commented 4 years ago

I prefer option A:

  1. I don't mind a little verbose iptables rules, I won't change it frequently once it works, and explicit rules are easy to maintain and read.
  2. -m jool in IPv6 world looks fine, but I worry -m jool in IPv4 world may be not reliable, for example, someone sends a fake raw packet router-IPv6 -> some-IPv6 to the router, or somebody just run ping6 64:ff9b::a.b.c.d on the router and then ping a.b.c.d doesn't work due to existed BIB entry.
  3. I prefer less lock contention in option A, entering kernel module jool as few as possible.

About option A, I have two questions:

  1. Will jool always pass all untranslatable packets to kernel and believe the kernel can properly drop packets? I hope this won't introduce bugs. If jool pass all untranslatable packets to kernel, then pool4 port range may overlap with net.ipv4.ip_local_port_range=32768 60999, so jool can handle much more concurrent connections.
  2. How is the new passthrough option configured? instance level or global level? Looks like it doesn't make sense to have instance level option for this behaviour.
ydahhrk commented 4 years ago

Not that I'm trying to defend option B, but I suspect you're misunderstanding the BIB. Here's some food for thought:

somebody just run ping6 64:ff9b::a.b.c.d on the router and then ping a.b.c.d doesn't work due to existed BIB entry.

But a.b.c.d would not be added to the BIB because it's a destination address. The BIB only stores addresses that belong to the translator (ie. pool4).

may be not reliable, for example, someone sends a fake raw packet router-IPv6 -> some-IPv6 to the router

(BTW: I'm assuming that by "router" you mean "translator." Please correct me if I'm wrong.)

I don't understand this either. The way I see it, the packet flow would be

The way I see it, this behaves the same whether it's option A or B.

entering kernel module jool as few as possible.

Why? Is this is this known to be harmful?


Will jool always pass all untranslatable packets to kernel and believe the kernel can properly drop packets? I hope this won't introduce bugs. If jool pass all untranslatable packets to kernel, then pool4 port range may overlap with net.ipv4.ip_local_port_range=32768 60999, so jool can handle much more concurrent connections.

Even if iptables Jool returns everything to the kernel, the admin will still be expected to separate pool4 and the ephemeral range. If the two ranges overlap, then connection collision will happen, regardless of whether we choose option A or B.

My initial tendency would be to return IPv6 packets whose destination addresses do not match pool6, and IPv4 packets whose destination addresses do not match pool4.

And this made me realize that Option B is wrong:

-m jool in IPv4 would match packets whose addresses match some existing BIB entry in the --instance instance.

It would have to be "-m jool in IPv4 would match packets whose addresses match the pool4 of the --instance instance."

How is the new passthrough option configured? instance level or global level? Looks like it doesn't make sense to have instance level option for this behaviour.

What is a "passthrough option"?

iptables rules must be associated with an instance (--instance) because there can be any number of them in the namespace. (This in turn allows for translating different traffic in different ways.)

Dieken commented 4 years ago

Not that I'm trying to defend option B, but I suspect you're misunderstanding the BIB. Here's some food for thought:

somebody just run ping6 64:ff9b::a.b.c.d on the router and then ping a.b.c.d doesn't work due to existed BIB entry.

But a.b.c.d would not be added to the BIB because it's a destination address. The BIB only stores addresses that belong to the translator (ie. pool4).

BIB is a source NAT table to record IPv6 source address(the client that initiated IPv6->IPv4 connection) and dynamic IPv4 source address(the translator that initiated IPv4->IPv4 connection), correct me if I'm wrong :-)

Here I omit the port in the BIB table because we mainly discuss ICMP.

may be not reliable, for example, someone sends a fake raw packet router-IPv6 -> some-IPv6 to the router

(BTW: I'm assuming that by "router" you mean "translator." Please correct me if I'm wrong.)

Yes, you are right, sorry I changed the word, router and [IPv4/IPv6 box] are just the translator.

I don't understand this either. The way I see it, the packet flow would be

  • router-IPv6 -> some-IPv6 (where some-IPv6 has the pool6 prefix)
  • Jool translates that into some-pool4 -> some-IPv4 (where some-IPv4 is some-IPv6 minus pool6). Jool creates BIB entry router-IPv6 | some-pool4
  • Assuming some-IPv4 exists, it responds some-IPv4 -> some-pool4
  • Jool receives, translates that into some-IPv6 -> router-IPv6. This packet is dropped.

The way I see it, this behaves the same whether it's option A or B.

The packet flow is same with what I understood, thanks for your detailed explanation, I realized I forgot that option A also didn't work in some scenario, let me summarize:

1.  If no `ping6` happened before, then no `router-IPv6 | some-pool4`
     entry in BIB,  now issue `ping` command on router, and the ICMPv4
     reply reaches the router:

  1.c:  current implementation: because BIB entry for some-pool4 not found
       (suppose we have single router-IPv4 available for pool4),  the ICMPv4
       reply is dropped, `ping` hangs,  bad.
  1.a:  option a:  even BIB entry not found, the ICMPv4 reply continues in
       iptables chains, finally got by `ping`,  good. 
  1.b:  option b: BIB entry not found, thus not matched by `-m jool` in IPv4
       world, ICMPv4 reply is got by `ping`, good.

2. If `ping6` happened before, then 'router-IPv6 | some-pool4' entry in BIB, 
    now issue `ping` command on router, and the ICMPv4 reply reaches the router:

  2.c: current implementation: the ICMPv4 reply is translated to ICMPv6 reply,
      `ping` don't understand, bad.
  2.a:  option a: BIB entry found, translated to ICMPv6, no chance to bypass JOOL and
       continue in the iptables chain, thus same with 2.c, bad. 
      // I didn't realize this also doesn't work.
  2.c:  option b: same with 2.c,  bad. 

Actually in this analysis the original client-IPv6 doesn't have to be router-IPv6, ping6 from another box to the router/translator box has same issue above.

entering kernel module jool as few as possible.

Why? Is this is this known to be harmful?

Don't misunderstand me, Jool is the most feature-rich and well-documented SIT/NAT64 open source implementation I investigated, you are very appreciated for this wonderful work!

I prefer explicit '-d IP' instead of '-m jool' to avoid entering Jool, that's just for potentially better performance and less bug, frankly speaking, Jool iptables extension isn't used as widely as iptables standard extensions although I'm fully respect your code quality, there are too many details in Linux network stack, so less code path usually means less bug.

Dieken commented 4 years ago

continue...

Will jool always pass all untranslatable packets to kernel and believe the kernel can properly drop packets? I hope this won't introduce bugs. If jool pass all untranslatable packets to kernel, then pool4 port range may overlap with net.ipv4.ip_local_port_range=32768 60999, so jool can handle much more concurrent connections.

Even if iptables Jool returns everything to the kernel, the admin will still be expected to separate pool4 and the ephemeral range. If the two ranges overlap, then connection collision will happen, regardless of whether we choose option A or B.

Agree, overlapping is too risky, I had the fluke mind that collision wouldn't happen often.

My initial tendency would be to return IPv6 packets whose destination addresses do not match pool6, and IPv4 packets whose destination addresses do not match pool4.

And this made me realize that Option B is wrong:

-m jool in IPv4 would match packets whose addresses match some existing BIB entry in the --instance instance.

It would have to be "-m jool in IPv4 would match packets whose addresses match the pool4 of the --instance instance."

I feel match some existing BIB entry is better than match the pool4 because scenario 1.a and 1.b in above comment work.

How is the new passthrough option configured? instance level or global level? Looks like it doesn't make sense to have instance level option for this behaviour.

What is a "passthrough option"?

iptables rules must be associated with an instance (--instance) because there can be any number of them in the namespace. (This in turn allows for translating different traffic in different ways.)

I mean a Jool option to control whether to drop packet or return packet to kernel if no BIB entry found for a reply packet. This option can be global or per instance.

But as I analyzed in above comment, it's hard to make local ICMPv4 work, so maybe it's not worth handling it, just have some way to bypass it, for example, assign multiple IPv4 address to translator box and don't include default IPv4 address in pool4.

ydahhrk commented 4 years ago

Ok, I agree with everything.

  2.c: current implementation: the ICMPv4 reply is translated to ICMPv6 reply,
      `ping` don't understand, bad.
  2.a:  option a: BIB entry found, translated to ICMPv6, no chance to bypass JOOL and
       continue in the iptables chain, thus same with 2.c, bad. 
      // I didn't realize this also doesn't work.
  2.c:  option b: same with 2.c,  bad.

Indeed; since TCP and UDP identify connections via ports, it is possible to separate the ephemeral range from pool4. ICMP identify connections via ~ID~ ICMP identifiers, and it's not possible to separate the router's identifier range from the NAT64's identifier range, and most ping clients do not allow the user to set the ICMP identifier either. So ICMP is a lost cause.

However, ICMP collision is far less likely than TCP/UDP collision, because pings are not as ubiquitous as TCP/UDP sockets.

I'll try to implement Option A. BRB.

ydahhrk commented 4 years ago

Ok, so I implemented a prototype of this in the issue297 branch. It appears to be working properly. (ie. ping from IPv4 node to translator works, as long as there is no BIB entry collision.)

Because Netfilter Jool already returned packets to the kernel in certain circumstances, all I did was unify the Netfilter and iptables "return packet to the kernel" code blocks. Hopefully, they will not need to behave differently.

The documentation doesn't specify exactly when a "packet return" is performed (as opposed to a "packet drop"), so I will detail the current behavior now.

As of issue297, both Netfilter Jool and iptables Jool will return the packet to the kernel if any of these conditions are met:

SIIT Jool also returns the packet to the kernel when at least one of these conditions are met:

Stateful NAT64 Jool also returns the packet to the kernel when at least one of these conditions are met:

*: This code is commentless, so I don't remember what led me to implement it this way, unfortunately.

**: This is because Netfilter Jool sometimes receives Neighbor Discovery packets, which should not be translated nor dropped. (I do not know if iptables Jool also receives these packets.) ~It is interesting to note that, from my reading of the code, SIIT Jool drops untranslatable/unknown ICMPv4 and ICMPv6 types, which is probably incorrect. I need to look into this.~ Neighbor Discovery packets are returned to the kernel on account of having link-local addresses.

Any concerns with these changes?

Dieken commented 4 years ago

I glanced at the document above and the patch, they look fine to me, sane to keep behavior consistent between netfilter and iptables.

I also built the prototype branch and verified the behaviour, it worked like a charm, ping6 POOL6-prefix::a.b.c.d from another box worked, and at the same time ping a.b.c.d from the translator box also worked. I ran the test for several minutes, no problem found. Did similar test on curl -6 and curl , worked too :-)

Great work! Thank you very much for pointing out my wrong test and further making local traffic work, amazing!

ghost commented 4 years ago

So how does one get stateful NAT64 to work on a single machine?

ydahhrk commented 4 years ago

So how does one get stateful NAT64 to work on a single machine?

https://jool.mx/en/run-nat64.html https://jool.mx/en/node-based-translation.html