NICMx / Jool

SIIT and NAT64 for Linux
GNU General Public License v2.0
317 stars 66 forks source link

Add Device Driver mode #140

Open ydahhrk opened 9 years ago

ydahhrk commented 9 years ago

2018-11-25 Update

Hello. If you came here from the survey, you'll notice that this thread is rather large, has evolved and often wildly branches off-topic. So here's a quick summary for what Device Driver mode is:

Basically, Device Driver Jool will be an alternative to Netfilter Jool and iptables Jool. Your translator will look like a network interface (jool0 in the snippet below):

user@T:~$ ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc fq_codel state DOWN group default qlen 1000
    link/ether 1c:1b:0d:62:7a:42 brd ff:ff:ff:ff:ff:ff
3: eth1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc fq_codel state DOWN group default qlen 1000
    link/ether 98:de:d0:80:b8:4d brd ff:ff:ff:ff:ff:ff
4: jool0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
    link/ether 64:64:64:64:64:64 brd ff:ff:ff:ff:ff:ff

It will behave similarly to loopback; it will look like an interface, but will in fact be a virtual one. An IPv6 packet routed towards it will be bounced back as an IPv4 packet, and vice-versa. You will send traffic to it by means of Linux's routing table rather than iptables rules.

The setup will probably very most intuitive for some people. The only drawback that I can think of is that, if you set it up on a translator meant to forward traffic, the machine will end up subtracting 3 (instead of 1) from the packet's TTL/Hop Limit field: One by Linux (when the packet is forwarded from eth0 to jool0), another one by Jool itself, and a last one by Linux again (when the packet is forwarded from jool0 to eth1).

And that's all, really. If that didn't already trigger chemistry in your brain, you probably don't need it.

Progress: Though I've tried to start this feature twice already, this work has been quickly obsoleted by a quickly evolving main branch. It's not practical to merge. I would have to start over from the beginning.


Original post

(As you will see, I still haven't finished writing this. I would, however, like this in the public domain in case someone has something interesting to say. I will come back and analyse this further once I've finished a lot of post-release and planning paperwork I need to flush from my desk.)

Being in the middle of Netfilter, we break Netfilter's assumptions.

As far as I can tell, the people who preceded me decided it would make sense for Jool to be a Netfilter/iptables module, because it's similar to NAT, and NAT is an iptables module.

Personally, I feel like we've hit a wall when it comes to pushing Netfilter's versatility, and we should find a way to more elegantly merge Jool with the kernel.

We seem to have the following options:

  1. Become a network (pseudo-)device driver (ie. look like an interface).
  2. Move over to userspace (follow Tayga's steps).
  3. Become an iptables module.
  4. Remain a Netfilter module and find workarounds for our compliance issues.

Both 1) and 2) appear to solve all of the following current annoyances:

  1. Filtering. Because doc from iptables discourages filtering on mangle, I'm renuent to ask users to do so (Even though I don't know what's the problem with mangle filtering, other than it looking somewhat counter-intuitive).
    Because Jool would look like an interface (1) or some userspace daemon (2), packets would not skip either the INPUT or the FORWARD chain, and therefore they would be filtered normally.
    This was already fixed using namespaces.
  2. Host-Based Edge Translation. 1) and 2) will naturally let the kernel know a route towards the RFC6052 prefix/EAM records/etc, so packets will survive ingress filtering.
    Currently, Jool cannot post a packet for local reception because it switches the layer-3 protocol of the packet. Linux goes "This is an IPv6 packet, but it came from an IPv4-only interface. Dropping."
    This can maybe currently be forced to work, but I don't think it's going to be pretty.
    This was already implemented using namespaces.
  3. --minMTU6. We can't ask the kernel to fragment to a particular size; ip_fragment() infers the MTU from the cached route, which is not --minMTU6-sensitive (though whether that's not better than --minMTU6 is still to be looked upon - another TODO).
    I decided to start deferring fragmentation to the kernel because the code is tricky to get right by ourselves and atrocious to learn and maintain.
    If we left Netfilter we would be free from the kernel's fragment representation and would be able to do it a lot easier.
    (though it would be best if the kernel exported a fragmentation function which received MTU as an argument, but that's not going to happen, particularly for old kernels.)
  4. Perhaps we would get rid of the need for two separate IPv4 addresses in stateful NAT64 mode. Not sure on this one; I need to think this more thoroughly - TODO pool4 port ranges fix this.

Less important but still worth mentioning:

  1. blacklist would be able to stop returning loopback and other evil addresses since, being far from pre-routing, Jool would naturally stop seeing these packets.

In my opinion, 1) is the most elegant option. This is because Host-Based Edge Translation forces the other options to include a dummy interface (so processes have an IPv4 address to snap themselves to). If an interface is necessary no matter the configuration, it would be cleanest if Jool itself "were" the interface.

Perhaps by adopting 2) we would attract new users who would not trust their kernels to us. On the other hand, it looks like a lot more work (I do not know to what extent is Jool married to kernel-only routines). It's also bound to make Jool somewhat slower, since packets need to be copied whenever they get in or out of kernelspace.

Other than perhaps get rid of the pools, I think there's not much to be earned from 3). Though we will look more like NAT, we will probably face roughly the same limitations as a Netfilter module (or perhaps more, since I'm not sure how NF_HOOK_THRESH() would behave when called from an iptables module).

3 and 4 sound like the most performance-friendly options (since there's less routing and no copying), and I feel like their symmetry with the kernel's NATting would make it the most elegant solution from the eyes of the kernel devs (which is important if we ever want to push Jool into Linux). I'm just wild guessing, though. Perhaps they want to keep Netfilter free of any more hacks and they'd prefer some of the other options better - TODO ask them.

Due to lack of experience, we're currently not aware of any roadblocks we might run into. More planning is necessary - TODO.

Criticism (on this post) and more ideas welcomed.

ydahhrk commented 9 years ago

Fifth option:

5) All (or several) of the above. Interface to any of the other frameworks via wrappers. Let the user decide which should be compiled.

Most work, more complicated for the user to install, maximum versatility.

toreanderson commented 9 years ago

Performance is an important concern. Make sure to go for a approach that lets you make use all the CPU cores in the machine. I'm wondering if today's framework might be the best performing since a packet only have to make one pass through the routing system. Going in and out of a virtual interface (either a device driver or connected to a user-space process) would probably mean the packet would be routed twice.

On the other hand, using DPDK in user-space is supposedly how you really push the envelope of how fast you can make a machine push packets. Maybe that would be something worth looking into, too.

When it comes to operational convenience (installation, setup, etc): Having it in the upstream kernel (i.e., the distro packages) is preferable to having it in user-space, which in turn is preferable to having it a stand-alone kernel module.

Finally I'd like to point out that if you solve the Host-Based Edge Translation use case, you've certainly solved the 464XLAT CLAT use case, too.

mcr commented 9 years ago

My concern is that it go upstream, that it be integrated with ip/nffilter, and that problem of sharing IP address with the host will go away. (I tried to use 192.168.2.1, and then use iptables to MASQUERADE that, but that doesn't work) I will blog my solution for getting a second IP using macvlan, but there are a number of situations where a second IP won't be available.

While someone might want to put this into DPDK, the more interesting situations will be getting it into NAT hardware.

mcr commented 9 years ago

Having a virtual interface as the way that to route traffic into jool would be more clearer conceptually. I think that it more clearly deals with MTU issues. I don't know what Host Based Edge Translation means. I don't think that anyone cares if it's in-kernel or not. One would have to have root, and be able to hook stuff up anyway to get it to work...

I think that having an iptables module which is attached (-i jool0) to a dummy interface which handles the MTU and routing would be the best. Perhaps one could overload the ipv4 address list of the dummy interface to provide the pool of v4. That might screw up the IPv4 routing table, so maybe it's a bad idea.

ydahhrk commented 9 years ago

Going in and out of a virtual interface (either a device driver or connected to a user-space process) would probably mean the packet would be routed twice.

Correct.

On the other hand, using DPDK in user-space is supposedly how you really push the envelope of how fast you can make a machine push packets. Maybe that would be something worth looking into, too.

Thank you :)

My concern is that it go upstream, that it be integrated with ip/nffilter, and that problem of sharing IP address with the host will go away.

AFAIK there is very little difference between being a Netfilter/iptables module (ie. Jool now) than being integrated to Netfilter/iptables. It seems like the second address is a result of us doing something wrong, but I can't put my finger on what it is ATM.

It's something I've wanted to truly sit down and think about since a long time ago, but I've always had more pressing matters to attend.

I will blog my solution for getting a second IP using macvlan, but there are a number of situations where a second IP won't be available.

Thank you :)

Having a virtual interface as the way that to route traffic into jool would be more clearer conceptually.

Thank you :). I guess it'd be better to explain to operators if it feels more natural.

I don't know what Host Based Edge Translation means.

It's an SIIT within an end node, and it's similar to 464XLAT's "Wireless 3GPP" Network setup. Jool's 464XLAT tutorial complains about Jool not supporting it:

There are rather several ways to do this. Unfortunately, one of them (making n6 the CLAT) is rather embarrassingly not yet implemented by Jool.

The point, I gather, is to not depend on an SIIT service elsewhere when you need translation.

I don't think that anyone cares if it's in-kernel or not. One would have to have root, and be able to hook stuff up anyway to get it to work...

I think it's mostly a problem with stability. If an userspace service crashes, it dies alone. If a kernel module crashes, it compromises the entire system.

Of course, we aim to never crash, but we're humans.

toreanderson commented 9 years ago

I'm toying with the idea of integrating SIIT-DC into OpenStack. In case you're familiar with OpenStack, what I'm thinking of doing is to integrate stateless translator support (SIIT/SIIT-EAM) in the virtual routers created by the Neutron L3 Agent. However, since these virtual routers live inside their own dedicated Linux network namespace, I can't do it with Jool as far as I can tell. I can with TAYGA, but Jool would of course be preferred... :-)

I don't know if you've decided yet on how the new framework will work, but I'm hoping you'll take this use case into consideration. The requirement would simply be to be able to start a distinct instance of Jool inside each network namespace (i.e., one per virtual router). It would also be useful to be able to run a Jool instance in Stateful NAT64 mode and another Jool instance in stateless mode inside a single network namespace at the same time.

ydahhrk commented 9 years ago

Hmmm, no. I'm not familiar with OpenStack. Need me to read on the subject?

I don't know if you've decided yet on how the new framework will work

I'm waiting for the 3.4 code to be ready to start making decisions on this.

That said, as far as SIIT goes, my current thinking is that options 1 and 2 (network (pseudo-)device driver and userspace) are dominant strategies hands down, performance notwithstanding. These solutions would also solve your first requirement (what with being able to have any number of Jools per namespace).

NAT64 is more fuzzy. There's actually a sixth option:

6) Drop the NAT64 code and make a really good tutorial on how to mix SIIT and NAT to pull NAT64 off.

This is probably best in the long run, and I'm thinking it would also address your problem. RFC6146 compliance would have to be tested all over again, though.

The requirement would simply be to be able to start a distinct instance of Jool inside each network namespace (i.e., one per virtual router).

Yes, this might prove important whether Jool switches frameworks or not.

Recognizing a packet's namespace shouldn't be too hard, so if you're in a hurry, I could assign this to my new coworker as his first assignment, and release this in Jool 3.4. It would most likely work completely different as it will in Jool 4.0, though.

It would also be useful to be able to run a Jool instance in Stateful NAT64 mode and another Jool instance in stateless mode inside a single network namespace at the same time.

Hmmm. The inability to have a SIIT and a NAT64 simultaneously is the Netlink socket's fault. This should probably be considered a bug.

ydahhrk commented 9 years ago

It would also be useful to be able to run a Jool instance in Stateful NAT64 mode and another Jool instance in stateless mode inside a single network namespace at the same time.

Which instance should intercept packets earlier?

toreanderson commented 9 years ago

I don't think you need to read up on OpenStack unless you feel like it. As long as it can work with network namespaces it should work with OpenStack. If I can spin up multiple instances that are connected to its own virtual network device (much like a TAYGA process is connected to its own TUN interface), that ought to do the trick. Then I could do something like this:

jool --create-instance jool123
ip netns create virtualrouter42
ip link set jool123 netns virtualrouter42

Or by creating the instance directly in the namespace:

ip netns create virtualrouter42
ip netns exec virtualrouter42 jool --create-instance jool123

With regards to dropping NAT64, don't do that - you can't simply mix SIIT + iptables NAPT44 to create a fully featured NAT64. For starters, you have 2^128 potential IPv6 clients accessing the NAT64, so you simply cannot map them into an IPv4 source address in a stateless manner.

If you're going down the virtual network device path the answer to your question on which instance should go first is easy - the routing table will decide what goes where. For example:

jool --create-instance jnat64 --mode nat64
jool --create-instance jsiit --mode siit
ip route add 64:ff9b::/96 dev jnat64
ip route add 2001:db8::/96 dev jsiit
jool --instance jnat64 --pool6 64:f9b::/96
jool --instance jsiit --pool6 2001:db8::/96
[....]

I'm not in a hurry. :-) BTW: I'm at the IETF93 meeting at the moment and I saw that there are two people from NIC Mexico attending too: Julio Cossio and Jorge Cano. Are they involved in Jool development? If so I'd like to locate them and say hi...

JAORMX commented 9 years ago

The reason this was initially implemented as a kernel-space tool was mostly because of performance. We knew there existed a userland tool but at the time it didn't meet the performance requirements Dr. Nolazco might recall something of that. Anyway, seems to me like those performance issues would now be solved using DPDK, though, that would tie the project to x86. Your call though.

ydahhrk commented 9 years ago

With regards to dropping NAT64, don't do that - you can't simply mix SIIT + iptables NAPT44 to create a fully featured NAT64. For starters, you have 2^128 potential IPv6 clients accessing the NAT64, so you simply cannot map them into an IPv4 source address in a stateless manner.

Oh yeah, I had a NAT66 in mind without realizing it. How silly. Scratch that, then :)

I'm not in a hurry. :-) BTW: I'm at the IETF93 meeting at the moment and I saw that there are two people from NIC Mexico attending too: Julio Cossio and Jorge Cano. Are they involved in Jool development? If so I'd like to locate them and say hi...

Wanna jabber this?

The reason this was initially implemented as a kernel-space tool was mostly because of performance.

Thank you. Standards compliance takes precedence, though.

Not that I'd get angry if a way to fix the issues without having to switch frameworks appeared.

Anyway, seems to me like those performance issues would now be solved using DPDK, though, that would tie the project to x86. Your call though.

Well, they seem to be wanting to increase their supported architectures, so this annoyance might hopefully be temporary.

(On the other hand, DPDK's installation procedure looks bananas. Sounds like efforts towards #163 will be in vain.)

Hmmm.

toreanderson commented 8 years ago

I just wanted to add here a discussion I recently had with @fingon and @sbyx from the OpenWrt project about the possibility about adding support for Stateful NAT64. It would appear that they have some problems with the current framework that prevents them from implementing that using Jool in a sensible manner. I was thinking that when deciding on an approach for the new framework, you might want to reach out to them to ensure the chosen new approach resolves their issues.

At least I think it would have been really nice to have Jool in OpenWrt, which could then be used for 464XLAT (both PLAT/NAT64 and CLAT functions) as well as for MAP-T (probably).

< tore> (that would actually have been a cool feature for folks like me, the ability to do nat64/dns64 on the internet-connected router instead of nat44 and keep the LAN v6only) < tore> oh well < tore> (is it possible to force v4 off even though isp gives dhcpv4 /32?) < cyrusff> no, each router decides on its own if it likes to introduce a v4 prefix < cyrusff> but you could tell indidivudal routers to not assign v4 prefixes on certain interfaces via config < cyrusff> nat64 is interesting < cyrusff> though i'm still in need of a useful kernel implementation < cyrusff> tayga is meh since its userspace and thus slowish < tore> cyrusff: I'm very happy with jool for my nat64 needs < tore> just replaced a few tayga+iptables-based boxes < cyrusff> tore: problem with jool for me is that its "all or nothing" < cyrusff> i can only have one instance and it catches all traffic < cyrusff> since it hooks into netfilter < cyrusff> ideally i need an interface which i can "route" to or a netfilter action which does the magic which i can apply selectively < tore> v3.4.0 will allow you to specify port ranges of pool4 < tore> but yeah, they're thinking about changing the framework < idli> oddly enough just yesterday someone requested dns64 + nat64 feature for homenet stuff from me :) < idli> he considered ipv4 legacy kept outside home

ydahhrk commented 8 years ago

Question

I can easily see SIIT moving over to the interface model, but NAT64 is weird (from IPv4 it looks more like NAT than SIIT).

Since each interface is normally connected to different networks, won't it mean the user will have to define a separate address block for pool4? I sort of see the user thinking about using private addresses [I don't anymore, unless they're NAT'd again], but it sounds like awkward/more configuration. I guess it won't be strange if users are used to this kind of thing, but are they?

sbyx commented 8 years ago

Well my point is ideally I would be able to have one NAT64 instance per outgoing (IPv4) interface that i want to NAT too and I am by some means able to decide which incoming interfaces are NAT64'ed and to which outgoing interface.

fingon commented 8 years ago

As discussed on IRC, ideally NAT64 = NAT66 + SIIT + NAT44. BTW: 'move to userspace' option noted in original post kills performance, so I do not consider it an option.

mcr commented 8 years ago

As discussed on IRC, ideally NAT64 = NAT66 + SIIT + NAT44. BTW: 'move to

Can you explain each step? I don't see what the NAT66 step does.

] Never tell me the odds! | ipv6 mesh networks [ ] Michael Richardson, Sandelman Software Works | network architect [ ] mcr@sandelman.ca http://www.sandelman.ca/ | ruby on rails [

sbyx commented 8 years ago
  1. NAT66 public IPv6 source address to some private IPv4-mapped IPv6 address (e.g. ::ffff:192.168.x.y)
  2. SIIT from IPv4-mapped IPv6 address to actual private IPv4 address
  3. Route to your v4-uplink (where it might get NAT44ed like regular outgoing IPv4 traffic)

Especially step 3 is important since it lets you use a shared NAT-state / port-space for the IPv4 NAT, you don't have to worry about distinct port-spaces for regular NAT44 and NAT64 and you don't have to worry about what happens if you don't have a "full" IPv4-address (i.e. MAP-E / MAP-T / LW4over6) or if the ISP does the NAT for you (DS-Lite).

ydahhrk commented 8 years ago

ideally NAT64 = NAT66 + SIIT + NAT44. BTW: 'move to userspace' option noted in original post kills performance, so I do not consider it an option.

So I guess it's not strange.

Good, I guess. :-)

This is the current direction of this development, then.

Doesn't all that routing also hamper performance, though?

  1. Packet appears. Route to NAT66 interface.
  2. Mask (Binding Information Base lookup included).
  3. Route from NAT66 interface to SIIT interface.
  4. Translate.
  5. Route from SIIT to NAT44.
  6. Translate (Binding Information Base lookup included).
  7. Route outside.
fingon commented 8 years ago

'route' is not probably the correct word here. Or well, it could be, but I would not design it that way.

You could even chain these 3 steps as single netfilter chain ('NAT64' = NAT66 + SIIT + MASQUERADE(ish) steps; in the other direction, there would be probably de-NAT, and then the SIIT+NAT66 steps), so there would be just one netfilter match (dst=/96 given to NAT64 for IPv4 mapped to IPv6) and then just bunch of matching rules without their own matching.

Correct design would be probably something slightly less efficient and more generic; I haven't really thought it through, but in general, even if you do lookup or two more in kernel, it is much cheaper than going to userland and back. Separating the steps would probably result in better modularity/configurability..

toreanderson commented 8 years ago
  1. NAT66 public IPv6 source address to some private IPv4-mapped IPv6 address (e.g. ::ffff:192.168.x.y)
  2. SIIT from IPv4-mapped IPv6 address to actual private IPv4 address
  3. Route to your v4-uplink (where it might get NAT44ed like regular outgoing IPv4 traffic)

I think you'll end up with kind of mongrel NAT64 this way. RFC6146 compliance will most likely go out the window.

One obvious example: A NAT64 is supposed to have a Binding Information Base for each protocol it supports. Each entry contains the address (X') and source port (x) of the IPv6 client , and the IPv4 transport address (T; «SNAT address») and transport port (t) it is mapped to. Thus: (X',x) <--> (T,t). However, in this stacked approach only step 1 is aware of the value of (X',x) and only step 3 is aware of the value of (T,t). So given the above approach, how and where can you query the BIB contents à la jool --bib?

tore@nat64gw1-osl2:~$ jool --bib -n | head -5
TCP:
[Dynamic] 192.0.2.240#1024 - 2001:db8:402:2:216:3eff:feba:3cd#48832
[Dynamic] 192.0.2.240#1029 - 2001:db8:202:2:216:3eff:febb:bd63#37221
[Dynamic] 192.0.2.240#1032 - 2001:db8:402:2:216:3eff:fe36:c893#50971
[Dynamic] 192.0.2.240#1034 - 2001:db8:202:a:18:59ff:fe3a:3953#52116
fingon commented 8 years ago

I do not like RFC6146 anyway - e.g. SIIT defines better fragment handling semantics. You could synthesize BIB-like information out of NAT66 + SIIT + NAT44 state if it was helpful (for user experience), but obviously implementation would not follow RFC6146 processing rules etc as they are defined in terms of BIB and not in terms of what actually needs to be done.

For the end user though, the result would not be different though; packets would come in via IPv6 and wind up IPv4 :-) (And fragmentation would actually work better, or at least, in case of NAT64, it is underspecified but SIIT defines relatively sane handling for it, including ICMP blackhole logic.)

toreanderson commented 8 years ago

Might be that a home user won't care, but the situation might be different for people who operate NAT64 that serve other environments like ISPs, data centres, or enterprise networks....I do care that my NAT64 gateways operate in a compliant manner that's easy to understand if I need to debug anything.

I think it you'd be hard pressed to correctly implement Address-Independent Filtering with the stacked approach too, and it wouldn't surprise me if it significantly complicated the implementation of ALGs (#114).

Also static BIB entries (i.e., port forwards for IPv4-to-IPv6 traffic) would be more complicated as you'd need to install them both in the NAPT44 and NAPT66 parts. Which reminds me, the NAPT44 and NAPT66 component would need to maintain their own separate session tables, so you'll end up keeping much more dynamic state than you really need to. In my experience keeping too much state is something that often becomes a bottleneck (I assume we've all seen the dreaded nf_conntrack: table full, dropping packet Linux kernel error on several occasions).

So I think you'd gain a lot of complexity while losing features and compliance by such an approach. While I don't know the OpenWrt internals, keeping NAT64 and NAPT44 as separate functions and assigning them non-overlapping IPv4:port-range pools to work with (or simply do not allow them to co-exist), does seem to me like the cleanest approach.

sbyx commented 8 years ago

Cleaner? maybe. Practical? definitly not. As noted before we have to deal with a variety of possible IPv4 uplink scenarios, including ones where we don't have a full IPv4 portspace (map, lw4over6) or where the ISP does CGN (dslite). To support these correctly we do need to separate the v4 NAT and the v6 translation to some degree, unless you come up with a clean and easy solution to handle all the special cases.

Also OpenWrt is an underfunded open-source effort and not comparable to a data center or enterprise network.

toreanderson commented 8 years ago

Fully aware that a CPE is not the same as a data centre router, I'm just interested in not throwing the baby out with the bathwater, i.e., avoiding breaking compliance and existing use-cases in order to support a new one.

So as far as solutions go, here are some I can think of:

I'll be happy to try and take a look at implementing any of these if you think they would be acceptable. However note that I haven't hacked on OpenWrt before so don't expect a pull request this week. :-)

In the case of DS-Lite, I'm not sure I fully understand the problem. With DS-Lite you shouldn't be doing NAPT44 in the home gateway at all, so there shouldn't be any problems with overlapping port-spaces. NAT64 could just use the default outgoing IPv4 address of the router, just as it can with a native IPv4 uplink. (The fact that this address is an RFC1918 one that would be routed into the B4 and then undergoing NAPT44 at the ISP's AFTR doesn't really seem relevant here.)

ydahhrk commented 8 years ago

I do not like RFC6146 anyway - e.g. SIIT defines better fragment handling semantics. (...) And fragmentation would actually work better, or at least, in case of NAT64, it is underspecified but SIIT defines relatively sane handling for it, including ICMP blackhole logic.

SIIT fragment handling cannot be correctly applied to NAT64, though. Doesn't NAT66 + SIIT + NAT44 inherit RFC6146 fragment semantics?

SIIT doesn't mangle ports, so fragments aren't an issue - each fragment can be easily translated separately.

A packet needs ports to be NAT64'd, otherwise the translator can't find the relevant binding and session. Since only the first fragment carries ports, 6146 needs an unspecified level of defragmentation. In practice, it's the same NAT44 uses, really.

Cleaner? maybe. Practical? definitely not. As noted before we have to deal with a variety of possible IPv4 uplink scenarios, including ones where we don't have a full IPv4 portspace (map, lw4over6) or where the ISP does CGN (dslite). To support these correctly we do need to separate the v4 NAT and the v6 translation to some degree, unless you come up with a clean and easy solution to handle all the special cases.

Pardon my ignorance, but what's the problem with NAT64 then NAT44? As in you NAT64, ISP NAT44. It'd be like [NAT66 + SIIT + NAT44] + NAT44, no?

Perhaps wrongly, Jool currently allows translation into IPv4 private space, and also starting from version 3.4, it'll also be able to limit the port ranges it can use.

RFC6146 compliance will most likely go out the window.

I haven't tested Simultaneous Open of TCP Connections in NAT44, but from the fact it has no qualms with using the ephemeral port range by default, it seems it'll break too.


For the record, my boss prefers RFC compliance, though I'm fine with a compromising consensus.

toreanderson commented 8 years ago

Pardon my ignorance, but what's the problem with NAT64 then NAT44? As in you NAT64, ISP NAT44. It'd be like [NAT66 + SIIT + NAT44] + NAT44, no?

That's a very valid point, actually. You could daisy-chain an (RFC compliant) NAT64 and a standard NAPT44 - in the same (OpenWrt) device. Jool's NAT64 would simply use one or more private IPv4 addresses for its IPv4 transport address pool. After IPv6->IPv4 translation, the packet would be sent through iptables' MASQUERADE or SNAT targets for NAPT44 towards the public IPv4 source address.

That would prevent NAPT44 having to share the pool of public IPv4 addresses and ports with NAT64.

It would cause the same level of degradation in functionality as the NAPT66->SIIT->NAPT44 suggestion (i.e., causing double translation, redundant state, probably complicating the insertion of static or UPnP-provisioned port forwards for IPv4-initiated traffic destined for an IPv6 host, etc.), but I think it would work just as well. The overall solution wouldn't be RFC compliant, but Jool itself could continue to be.

Also, if NAPT44 isn't use, Jool could use the public addresses/ports as its IPv4 transport pool. I'm guessing that for most people, NAPT44 and NAT64 would be an either/or really. I can't think of a normal use case for running both simultaneously.

Perhaps wrongly, Jool currently allows translation into IPv4 private space

It's not wrong to use private IPv4 space as transport addresses. What RFC6052 section 3.1 forbids, is the use of IPv4-converted addresses that embeds RFC1918 space, and only for the WKP 64:ff9b::/96. So an IPv6 packet destined for 64:ff9b::192.168.1.1 is supposed to be dropped. A packet destined for 2001:db8:64::192.168.1.2 is on the other hand completely legitimate. It is also legitimate for an IPv6 packet sourced from 2001:db8::123#12345 can also be assigned a BIB entry mapping it to 192.168.1.1#23456.

mcr commented 8 years ago

My preference is that NAT64 becomes integrated into the current Linux NAT44 code, such that all of the NAT?4 code and datastructures are common, and it's just how the pre-mangled packets are classified into conntracks is different.

I think that this is the cleanest way. From an operational point of view, I'm happy if the IPv6 traffic appears to disappear into a magic virtual interface, and appear from it. I'm actually happiest if we do it that way, (supporting something like: "ip6tables -t nat -o siit0 -s abcd::/xx") such that we can more clearly using routing daemons/etc. to decide which traffic get into the NAT64 wormhole, and what doesn't.

] Never tell me the odds! | ipv6 mesh networks [ ] Michael Richardson, Sandelman Software Works | network architect [ ] mcr@sandelman.ca http://www.sandelman.ca/ | ruby on rails [

ydahhrk commented 8 years ago

Now that Node-Based Translation and Filtering don't depend on this (see scratched text above), the urgency of supporting other or more paradigms/frameworks seems less overwhelming. Also, @rolivasnic has made significant progress on NAT64 database redundancy (#113) and atomic configuration (#164).

Is it reasonable to sandwich another new-features release (3.5) between 3.4 and 4.0 (this)? I would be tempted to add --minimum-ipv6-mtu (#136) and namespaces (#187) to 3.5 too.

toreanderson commented 8 years ago

Are you asking me? I'm going to assume you are. :smile:

I'm very happy with the way Jool currently works so as far as I'm concerned it could stay like it is in the future. Especially considering that if you need a «Jool net-device» for whatever reason you can accomplish that easily using a network namespace.

Therefore I'd be much more interested in features such as #114, #164, and #187. For integration in OpenStack I'd also need the possibility of making SIIT and NAT64 Jool be able to co-exist (within the same network namespace). Is that already considered part of #187?

ydahhrk commented 8 years ago

Are you asking me? I'm going to assume you are. :D

Sure, but again, some other folks see more future in this fix.

I'm concerned people might think the current framework does not scale well and this might be a blocker for using Jool.

This seems to be the case for the OpenWrt people. I'm hoping for an answer on whether the [NAT64] + NAT44 idea and the #187 improvements are comfortable workarounds for Jool not being a net device. Or an iptables module.

For integration in OpenStack I'd also need the possibility of making SIIT and NAT64 Jool be able to co-exist (within the same network namespace). Is that already considered part of #187?

Yes, but are you sure this will work as advertised? SIIT Jool is particularly infamous in that it tries to swallow all received traffic (especially in IPv4 and if the RFC 6052 prefix is present), which makes me think anything else you might want to do in the same namespace will just be getting leftovers.

(Unless it's chained before... but it's still weird. Still, I don't actually know what you're doing.)

It's what upsets me the most about SIIT Jool right now (and is a direct consequence of Netfilter).

toreanderson commented 8 years ago

Yes, but are you sure this will work as advertised? SIIT Jool is particularly infamous in that it tries to swallow all received traffic (especially in IPv4 and if the RFC 6052 prefix is present), which makes me think anything else you might want to do in the same namespace will just be getting leftovers.

Maybe. My initial idea would simply avoid overlapping addresses for --pool4, --pool6, --eamt, and «anything else». Perhaps I'd need a dash of --blacklist too...

ydahhrk commented 8 years ago

Maybe. My initial idea would simply avoid overlapping addresses for --pool4, --pool6, --eamt, and «anything else». Perhaps I'd need a dash of --blacklist too...

Ok. In any case, we should probably queue SIIT Jool after NAT64 Jool so SIIT gets the NAT64 leftovers and not the other way around.

toreanderson commented 8 years ago

Just thought I'd mention that user-space packet processing frameworks such as VPP and Snabb Switch seems to be getting more and more fashionable. First and foremost they tend to be really, really, fast. I'm also guessing that developing features and applications is going to be easier to do in user space than in kernel space.

Cisco published a very interesting blog post about VPP a few days ago. It's well worth the read, and it got me thinking that it's probably worth considering if Jool fits as an application/feature living within one of these user-space packet processing frameworks.

ydahhrk commented 8 years ago

Ok

Sorry about the silence lately; I'm still roaming around in South America (as vacations now) and my service provider doesn't reach this area, so I've been having trouble getting online.

I'll try to craft a less rushed response on monday.

danrl commented 7 years ago

@JAORMX

Anyway, seems to me like those performance issues would now be solved using DPDK, though, that would tie the project to x86

I'd like to mention the home user and small office use case here. Currently experimenting with Jool on LEDE/OpenWRT and x86 architecture is probably one of the smaller target architectures there. I liked the fact that Jool offered a solution for big boxes in data centers as well as the small plastic routers we unfortunately often have to run at home. I would like everyone to keep this (currently small but growing) target group in mind: IPv6 only SOHO networks.

My two cents on @ydahhrk initial statement:

Become a network (pseudo-)device driver (ie. look like an interface).

WireGuard, a in-kernel VPN (see link below), uses an interface that comes without addressing initially. This works great and is easy to integrate in all kinds of use cases. IP addresses can be assigned with ip and all sorts of custom setups would be possible. Also, instead of being part of netfilter, one can use netfilter to create rules for the interface. An interface is IMHO the abstraction that fits Jool perfectly and allows for the greatest flexibility. I do not know about the performance impact, though. Using interface as framework may also reduce the amount of required code, depending on how much of the kernel functions can be used. However, this goes beyond my area of expertise and is speculative.

Moreover, this looks like the most promising path to upstreaming into the kernel to me.

Move over to userspace (follow Tayga's steps).

Please don't. We have tayga already. However, it is easy to distribute userspace applications compared to kernel modules.

Become an iptables module.

Hmm... Sounds quite interesting to me. What if iptables gets deprecated some day? All gone?

Remain a Netfilter module and find workarounds for our compliance issues.

There must be a better way.

In WireGuard (http://www.wireguard.io) we have had tremendous performance improvements by leveraging the kernel's PADATA functions: http://lxr.free-electrons.com/source/kernel/padata.c

ydahhrk commented 7 years ago

@toreanderson:

I'll try to craft a less rushed response on monday.

OOOOOPS; I left this hanging. My bad.

OK so most of the concerns that inspired this thread have met workarounds so I'm not sure switching frameworks is worthwhile anymore. Not that I don't want to do it; out of the three paths of least resistance (device driver, iptables and Netfilter), Netfilter is my least favorite because of the greedy packet stealing.

You have so far proposed...

I shouldn't pretend like I did my homework getting a proper mouthful of these products, but I took a look at VPP's packet representation and it looks like it's not the same as the kernel's. This is perfectly reasonable, but kind of bad. Now that I've had to manhandle the RFC 7915 code I no longer think that Jool's relationship with struct sk_buff is very platonic. And it is somewhat complicated as it is. This doesn't mean that a Jool+VPP combo it is infeasible, just that I don't think that the cost-benefit ratio is right. The other userspace frameworks will likely offer the same obstacle.

BTW it looks like VPP is natively going to support NAT64 natively eventually.

It would also be useful to be able to run a Jool instance in Stateful NAT64 mode and another Jool instance in stateless mode inside a single network namespace at the same time.

I'm the biggest idiot, I apologize.

@danrl:

I would like everyone to keep this (currently small but growing) target group in mind: IPv6 only SOHO networks.

Moreover, this looks like the most promising path to upstreaming into the kernel to me.

Please don't. We have tayga already. However, it is easy to distribute userspace applications compared to kernel modules.

Agree, agree, agree. Agree.

Hmm... Sounds quite interesting to me. What if iptables gets deprecated some day? All gone?

My current trend is to implement the three in-kernel options (device driver/iptables/netfilter). This is because it looks like they are all the same; we would just wrap them differently. I don't think that we need to drop one in favor of the others.

So it doesn't hurt if one of them is deprecated.

There must be a better way.

Well, we already found most workarounds and I think they're elegant, so... :)

In WireGuard (http://www.wireguard.io) we have had tremendous performance improvements by leveraging the kernel's PADATA functions: http://lxr.free-electrons.com/source/kernel/padata.c

Thank you :)

jordipalet commented 6 years ago

Is Jool working in LEDE ?

I’m trying to setup Jool for NAT64 functionality in a small CPE.

I’m familiar with Jool, as I use it in Ubuntu.

So, I installed both kmod-jool and jool-tools, and using my previous script which works in Ubuntu, tried to make it work:

!/bin/sh

sysctl -w net.ipv4.conf.all.forwarding=1 sysctl -w net.ipv6.conf.all.forwarding=1 ethtool --offload br-lan gro off lro off ethtool --offload eth0.6 gro off lro off ip addr add 10.10.10.19/24 dev eth0.6 ip -6 route add 2001:470:68ee:30::/64 via 2001:470:68ee:20::21 ip -6 route add 2001:470:68ee:40::/64 via 2001:470:68ee:20::21 modprobe jool pool6=64:ff9b::/96 pool4=10.10.10.19

However, I can’t get it working in LEDE.

If I traceroute an any 64:ff9b::/96, it is being routed to my default IPv6 gateway instead of going thru jool …

I’m missing anything?

Thanks in advance!

ydahhrk commented 6 years ago

I haven't tested LEDE, but it should work unless there is a bug.

If I traceroute an any 64:ff9b::/96, it is being routed to my default IPv6 gateway instead of going thru jool

You mean the packets are not reaching Jool in the first place? How does the routing table of your IPv6 client look?

Otherwise, did you try to find the problem via logging?

jordipalet commented 6 years ago

I believe the problem is that I'm missing some instructions to configure Jool in LEDE (by the way is the same as OpenWRT).

My script in Ubuntu, just works. I don't need to tell Ubuntu to forward 64:ff9b::/96 to Jool ...

In LEDE, using the LEDE CPE itself as the "client" of Jool (also tried from outside), it has a default router to the IPv6 gateway (the ISP link), and I don't see "how to" tell LEDE that anything for the Jool pool (64:ff9b::/96 in my case), needs to go to "Jool" instead of sent directly to the default GW ...

Other protocols in LEDE, you need to configure an Interface for them. For example tried with Tayga some time ago, and it was working fine.

ydahhrk commented 6 years ago

I believe the problem is that I'm missing some instructions to configure Jool in LEDE (by the way is the same as OpenWRT).

But other users have confirmed that Jool is working fine in OpenWRT, without special configuration.

My script in Ubuntu, just works. I don't need to tell Ubuntu to forward 64:ff9b::/96 to Jool ...

I think this is what's strange, not the LEDE stuff.

Jool is a Netfilter module that only hooks itself to the prerouting chain. By definition, it never translates traffic generated by its own node.

You can emulate the interface thing by enclosing Jool in a namespace, and sending packets to that namespace by means of a virtual interface.

(also tried from outside)

Is this working or not? If it worked, it is normal. If it didn't, I think there is something preventing the packets from reaching Jool.

Other protocols in LEDE, you need to configure an Interface for them. For example tried with Tayga some time ago, and it was working fine.

Yeah, that's what I want to improve by turning Jool into a device driver. Jool 4.0.0 will function exactly like this, but for now we have to work around Netfilter's limitations.

petrosagg commented 6 years ago

@jordipalet @ydahhrk I faced the same problem and I think what's happening is that OpenWrt's modprobe is ignoring the module arguments passed in the command line. I changed to insmod like so insmod jool pool6=64:ff9b::/96 and worked out of the box.

ydahhrk commented 6 years ago

@petrosagg You're right, thank you.

It hadn't dawned upon me that OpenWRT is such a different world. And, as a stumbling newcomer myself, I can see that Jool's documentation wouldn't be very useful to get it running there. I think it's worth some notes.

BRB.

petrosagg commented 6 years ago

I'm a newcomer to OpenWrt too, it took me a lot of frustration to figure it out...

ydahhrk commented 6 years ago

Sorry for the troubles.

I just added this to the documentation. I also added OpenWRT code tabs to the tutorials. (All of this might need a browser F5 refresh.) Hopefully, this won't happen again.

petrosagg commented 6 years ago

That's awesome! Thanks a lot for making this quality module :)

ydahhrk commented 6 years ago

Thanks for the patience :)

jordipalet commented 6 years ago

Fantastic Alberto, thanks a lot!

Saludos,

Jordi

De: Alberto Leiva Popper notifications@github.com Responder a: NICMx/Jool reply@reply.github.com Fecha: martes, 6 de marzo de 2018, 20:32 Para: NICMx/Jool Jool@noreply.github.com CC: jordipalet jordi.palet@consulintel.es, Mention mention@noreply.github.com Asunto: Re: [NICMx/Jool] Switching frameworks might immediately solve several other issues (#140)

Sorry for the troubles.

I just added this to the documentation. I also added OpenWRT code tabs to the tutorials. (All of this might need a browser F5 refresh.) Hopefully, this won't happen again.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.


IPv4 is over Are you ready for the new Internet ? http://www.consulintel.es The IPv6 Company

This electronic message contains information which may be privileged or confidential. The information is intended to be for the exclusive use of the individual(s) named above and further non-explicilty authorized disclosure, copying, distribution or use of the contents of this information, even if partially, including attached files, is strictly prohibited and will be considered a criminal offense. If you are not the intended recipient be aware that any disclosure, copying, distribution or use of the contents of this information, even if partially, including attached files, is strictly prohibited, will be considered a criminal offense, so you must reply to the original sender to inform about this communication and delete it.

CodeFetch commented 5 years ago

What you are currently doing is to reinvent the network stack (connection tracking for FTP etc.). @ydahhrk is right. Jool should aim to become a mainstream kernel module and that is only possible if it either gets tightly integrated into netfilter, which will likely not be possible easily, because it would need at least another two tables one before NAT and one after NAT if I'm not mistaken, it will be ugly and needs heavy modification of several userspace tools and APIs. A userspace Jool version is what you see if you look at Tayga (Tayga calls itself NAT64, but it's actually a NAT46). There are no efforts to get TUN-devices perform better by e.g. providing a TUN socket and you really have to build your own netfilter in userspace to provide a powerful NAT64 on you own. Even OpenVPN is dying slowly due to the event of WireGuard because of the performance impact of a TUN device due to its context switches. Having a well-performing userspace TUN device requires you to use Linux AIO for reads and writes (if that works at all). Thus if you want to take this path, the first step should be to implement AIO support in Tayga. You need NAT44 and NAT66 to allow the features Jool offers, but it's actually cleaner than hooking into netfilter. Of course a userspace NAT46 can be seen as a feature due to portability, but on devices running OSes like Android or iOS you have other restrictions and won't be able to configure a NAT44/NAT66 easily to make it become a NAT64. Another thing and the most convincing argument for me to think Jool should become a virtual network device is that a kernel land NAT46 device would likely be accepted upstream and it is not that hard to implement it safely as one might think. Please have a look a https://github.com/ayourtch/nat46/tree/master/nat46/modules and the modules mentioned at the bottom of the page. There were so many efforts to build a good NAT46 translator as a device and at some point the projects died, because they were not upstreamed. Linux has its janitors and they will keep such a module alive if you manage to get it upstreamed.

ydahhrk commented 5 years ago

Hmm. The idea of "becoming a mainstream kernel module" has popped up often and isn't really the same as the device driver support feature. Maybe it's time to open a new bug.

What you are currently doing is to reinvent the network stack (connection tracking for FTP etc.)

Just to clarify: Do you mean this as a bad thing or as a neutral thing?

You seem to be voting for both device driver and mainstream module, but none of these will prevent Jool from having to do FTP connection tracking once #114 is implemented. (Unless I'm missing something.)

ydahhrk commented 3 years ago

~At the moment, the tendency is to merge Jool with nftables (#273). The prospect of adding device driver mode seems farfetched at this point, because it doesn't seem like an improvement, and also because resources are running thin.~

~I will chop this off the TODO list for now.~