NICMx / Jool

SIIT and NAT64 for Linux
GNU General Public License v2.0
319 stars 66 forks source link

SIIT-DC with EAM not working on Ubuntu 20.04 #338

Closed starcraft66 closed 4 years ago

starcraft66 commented 4 years ago

Hi, I'm trying to setup SIIT-DC so that I can offer v4 access to services in my v6-only network for legacy clients. I was able to get everything working by compiling jool from source on ubuntu 18.04 and using iptables. Once my test setup worked, I decided to upgrade to ubuntu 20.04 and use netfilter and everything broke apart. NAT64 is working great but SIIT is completely broken. I am running jool 4.0.7 packaged in the ubuntu focal repository.

Here are my configuration files:

tristan@k8s-natdns64:~$ cat /etc/jool/jool.conf
{
        "comment": "Configuration for the systemd NAT64 Jool service.",

        "instance": "default",
        "framework": "netfilter",

        "global": {
                "comment": "NAT64 prefix",
                "pool6": "64:ff9b::/96"
        }
}
tristan@k8s-natdns64:~$ cat /etc/jool/jool_siit.conf
{
        "comment": "Sample full SIIT configuration.",

        "instance": "default",
        "framework": "netfilter",

        "global": {
                "comment": "pool6 and the RFC6791v4 pool belong here, ever since Jool 4.",
                "pool6": "64:ff9b::/96",
        },

        "eamt": [
                {
                        "ipv6 prefix": "2607:fa48:6ed8:8a54:3::",
                        "ipv4 prefix": "172.16.30.2"
                }
                ]
}

tristan@k8s-natdns64:~$ sudo jool_siit instance display
+--------------------+-----------------+-----------+
|          Namespace |            Name | Framework |
+--------------------+-----------------+-----------+
|           b95e2100 |         default | netfilter |
+--------------------+-----------------+-----------+
tristan@k8s-natdns64:~$ sudo jool_siit eamt display
+---------------------------------------------+--------------------+
|                                 IPv6 Prefix |        IPv4 Prefix |
+---------------------------------------------+--------------------+
|                 2607:fa48:6ed8:8a54:3::/128 |     172.16.30.2/32 |
+---------------------------------------------+--------------------+
tristan@k8s-natdns64:~$ sudo jool_siit global display
  manually-enabled: true
  pool6: 64:ff9b::/96
  lowest-ipv6-mtu: 1280
  logging-debug: false
  zeroize-traffic-class: false
  override-tos: false
  tos: 0
  mtu-plateaus: 65535,32000,17914,8166,4352,2002,1492,1006,508,296,68
  amend-udp-checksum-zero: false
  eam-hairpin-mode: intrinsic
  randomize-rfc6791-addresses: true
  rfc6791v6-prefix: (unset)
  rfc6791v4-prefix: (unset)

My router has a static route directing 64:ff9b::/96 to 2607:fa48:6ed8:8a51::64 which is my machine running jool. It also has a route sending traffic destined to 172.16.30.2 through 172.16.29.6.

Here are my ip addresses:

tristan@k8s-natdns64:~$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether c2:6a:23:da:d4:c9 brd ff:ff:ff:ff:ff:ff
    inet 172.16.29.6/24 brd 172.16.29.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 2607:fa48:6ed8:8a51:c06a:23ff:feda:d4c9/64 scope global dynamic mngtmpaddr noprefixroute
       valid_lft 431991sec preferred_lft 3591sec
    inet6 2607:fa48:6ed8:8a51::64/64 scope global
       valid_lft forever preferred_lft forever
    inet6 fe80::c06a:23ff:feda:d4c9/64 scope link
       valid_lft forever preferred_lft forever

When jool_siit is running (jool_siit global update manually-enabled true) ipv4 access seems to break and translations according to the EAMT don't happen at all. If I ping my router 172.16.29.1 from that machine, it times out. If I ping my machine 172.16.29.6 from the router, it times out. However, as soon as I stop jool_siit (jool_siit global update manually-enabled false) all of those ipv4 pings start working again.

If I run curl http://172.16.30.2 on my router, it just times out instead of loading the page. Running curl http://[2607:fa48:6ed8:8a54:3::] works fine.

EDIT: Just to clarify, going back to iptables and adding the right iptables rules from the docs makes everything work again but I would rather use netfilter for simplicity. Also, when in iptables mode I also can't interact with my jool box over ipv4, all traffic is dropped (but nat64/dns64/siit all work fine), not sure if that's a bug or intentional.

ydahhrk commented 4 years ago

Well, why did you switch from iptables to netfilter?

Netfilter Jool is very greedy, and my first hypothesis would be that pool6 is probably eating up all your IPv4 traffic.

For example, from your "ping my router" test:

  1. Jool machine writes echo request 172.16.29.6 -> 172.16.29.1
  2. Router responds echo reply 172.16.29.1 -> 172.16.29.6
  3. Jool translates that into echo reply 64:ff9b::172.16.29.1 -> 64:ff9b::172.16.29.6
  4. Packet gets lost because nobody's listening at 64:ff9b::172.16.29.6 or something like that

Possible solutions:

  1. Use blacklist4 [0] to prevent 172.16.29.1 and/or 172.16.29.6 from being translated
  2. Stop using pool6, use instead the EAMT to specify exactly which addresses should be translated
  3. Move setup to iptables so you can filter normally

Also, you might want to install Jool 4.1.2 so you can enable debug easily and see what's happening: [1] (You will probably want to uninstall 4.0.7 first.)

[0] https://jool.mx/en/usr-flags-blacklist4.html [1] https://jool.mx/en/usr-flags-global.html#logging-debug

On Sat, Aug 1, 2020 at 11:11 AM Tristan notifications@github.com wrote:

Hi, I'm trying to setup SIIT-DC so that I can offer v4 access to services in my v6-only network for legacy clients. I was able to get everything working by compiling jool from source on ubuntu 18.04 and using iptables. Once my test setup worked, I decided to upgrade to ubuntu 20.04 and use netfilter and everything broke apart. NAT64 is working great but SIIT is completely broken. I am running jool 4.0.7 packaged in the ubuntu focal repository.

Here are my configuration files:

tristan@k8s-natdns64:~$ cat /etc/jool/jool.conf { "comment": "Configuration for the systemd NAT64 Jool service.",

    "instance": "default",
    "framework": "netfilter",

    "global": {
            "comment": "NAT64 prefix",
            "pool6": "64:ff9b::/96"
    }

} tristan@k8s-natdns64:~$ cat /etc/jool/jool_siit.conf { "comment": "Sample full SIIT configuration.",

    "instance": "default",
    "framework": "netfilter",

    "global": {
            "comment": "pool6 and the RFC6791v4 pool belong here, ever since Jool 4.",
            "pool6": "64:ff9b::/96",
    },

    "eamt": [
            {
                    "ipv6 prefix": "2607:fa48:6ed8:8a54:3::",
                    "ipv4 prefix": "172.16.30.2"
            }
            ]

}

tristan@k8s-natdns64:~$ sudo jool_siit instance display +--------------------+-----------------+-----------+ | Namespace | Name | Framework | +--------------------+-----------------+-----------+ | b95e2100 | default | netfilter | +--------------------+-----------------+-----------+ tristan@k8s-natdns64:~$ sudo jool_siit eamt display +---------------------------------------------+--------------------+ | IPv6 Prefix | IPv4 Prefix | +---------------------------------------------+--------------------+ | 2607:fa48:6ed8:8a54:3::/128 | 172.16.30.2/32 | +---------------------------------------------+--------------------+ tristan@k8s-natdns64:~$ sudo jool_siit global display manually-enabled: true pool6: 64:ff9b::/96 lowest-ipv6-mtu: 1280 logging-debug: false zeroize-traffic-class: false override-tos: false tos: 0 mtu-plateaus: 65535,32000,17914,8166,4352,2002,1492,1006,508,296,68 amend-udp-checksum-zero: false eam-hairpin-mode: intrinsic randomize-rfc6791-addresses: true rfc6791v6-prefix: (unset) rfc6791v4-prefix: (unset)

My router has a static route directing 64:ff9b::/96 to 2607:fa48:6ed8:8a51::64 which is my machine running jool. It also has a route sending traffic destined to 172.16.30.2 through 172.16.29.6.

Here are my ip addresses:

tristan@k8s-natdns64:~$ ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000 link/ether c2:6a:23:da:d4:c9 brd ff:ff:ff:ff:ff:ff inet 172.16.29.6/24 brd 172.16.29.255 scope global eth0 valid_lft forever preferred_lft forever inet6 2607:fa48:6ed8:8a51:c06a:23ff:feda:d4c9/64 scope global dynamic mngtmpaddr noprefixroute valid_lft 431991sec preferred_lft 3591sec inet6 2607:fa48:6ed8:8a51::64/64 scope global valid_lft forever preferred_lft forever inet6 fe80::c06a:23ff:feda:d4c9/64 scope link valid_lft forever preferred_lft forever

When jool_siit is running (jool_siit global update manually-enabled true) ipv4 access seems to break and translations according to the EAMT don't happen at all. If I ping my router 172.16.29.1 from that machine, it times out. If I ping my machine 172.16.29.6 from the router, it times out. However, as soon as I stop jool_siit (jool_siit global update manually-enabled false) all of those ipv4 pings start working again.

If I run curl http://172.16.30.2 on my router, it just times out instead of loading the page. Running curl http://[2607:fa48:6ed8:8a54:3::] works fine.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/NICMx/Jool/issues/338, or unsubscribe https://github.com/notifications/unsubscribe-auth/AASHNF2H32NZS665SSBD3K3R6Q5DHANCNFSM4PR3L4ZQ .

starcraft66 commented 4 years ago

Well, why did you switch from iptables to netfilter?

I thought it was better because distros are moving away from iptables to nftables. I have reverted my setup back to iptables and added the correct iptables and ip6table prerouting rules with the jool chain. I also removed the global pool6 section from my config. This is all that is left:

root@k8s-natdns64:~# cat /etc/jool/jool_siit.conf
{
        "comment": "Sample full SIIT configuration.",

        "instance": "default",
        "framework": "iptables",

        "eamt": [
                {
                        "ipv6 prefix": "2607:fa48:6ed8:8a54:3::",
                        "ipv4 prefix": "172.16.30.2"
                }
                ]
}

My ipv4 access to that vm is no longer broken, however, attempting to run curl http://172.16.30.2 still times out. If I try to ping that address, I get: From 172.16.29.6: icmp_seq=1 Redirect Host(New nexthop: 172.16.29.1)

ydahhrk commented 4 years ago

You did the first part of the second option, but ignored the second part:

  1. Stop using pool6, use instead the EAMT to specify exactly which addresses should be translated

Since you removed pool6, you now have to provide a replacement.

Now the packet flow is likely

  1. Router writes packet 172.16.29.1 -> 172.16.30.2
  2. Jool can translate 172.16.30.2 with the EAMT, but cannot translate 172.16.29.1. Packet is returned to Linux, which presumably also doesn't know what to do with it.

Just to clarify: You don't have to do both options simultaneously. Replacing pool6 and moving to iptables are both self-sufficient solutions meant to solve the same problem.

My router (...) has a route sending traffic destined to 172.16.30.2 through 172.16.29.6.

I should have realized this sooner:

Which router are you talking about? You said this was SIIT-DC, so I thought yours was an IPv6-only network. Is this router outside of your domain? Or does your IPv6 router have an IPv4 route? Does it also have an IPv4 address? What is the purpose of the IPv4 address?

starcraft66 commented 4 years ago

You did the first part of the second option, but ignored the second part:

The EAMT is clearly defined in the config snippet I posted above...

tristan@k8s-natdns64:~$ sudo jool_siit eamt display
+---------------------------------------------+--------------------+
|                                 IPv6 Prefix |        IPv4 Prefix |
+---------------------------------------------+--------------------+
|                 2607:fa48:6ed8:8a54:3::/128 |     172.16.30.2/32 |
+---------------------------------------------+--------------------+

Which router are you talking about?

Untitled Diagram

You said this was SIIT-DC, so I thought yours was an IPv6-only network.

Yes, my kubernetes network is completely ipv6-only. My Jool box has ipv4 access because it needs to also run nat64 for my kubernetes services to be able to access the ipv4 internet. My goal here is to be able to get end-users who don't have ipv6 access to connect to my ipv6-only services on my kubernetes load balancer (2607:fa48:6ed8:8a54:3::).

Is this router outside of your domain? Or does your IPv6 router have an IPv4 route?

The router has ipv4 and ipv6 access and is controlled by me.

Does it also have an IPv4 address?

Yes

What is the purpose of the IPv4 address?

The purpose is to allow ipv4 clients to connect to my services and also to allow my services to access ipv4 resources while most of my network is ipv6-only.

EDIT: I am testing with the source address 172.16.29.1 just because it's convenient, but I want any address on the public internet to be able to be a source address.

ydahhrk commented 4 years ago

The EAMT is clearly defined in the config snippet I posted above...

Yes, you have an EAM entry that can be used to translate the packet's destination address, but you don't have one for the source address. That's what pool6 was meant for.

Suppose random Internet node 1.2.3.4 makes a request to you server 172.16.30.2. You want the packet flow to look like this:

  1. Packet 1.2.3.4 -> 172.16.30.2 arrives from the WAN.
  2. VyOS forwards it to Jool
  3. Jool translates 1.2.3.4 into 64:ff9b::1.2.3.4 (per pool6) and 172.16.30.2 into 2607:fa48:6ed8:8a54:3:: (per your EAMT)
  4. Packet 64:ff9b::1.2.3.4 -> 2607:fa48:6ed8:8a54:3:: goes to VyOS, which then forwards it to the Kubernetes service, bunch of hashing happens, Kubernetes returns response 2607:fa48:6ed8:8a54:3:: -> 64:ff9b::1.2.3.4, VyOS sends it to Jool.
  5. Jool translates 2607:fa48:6ed8:8a54:3:: into 172.16.30.2(per your EAMT) and 64:ff9b::1.2.3.4 to 1.2.3.4 (per pool6).
  6. Packet is forwarded to VyOS, then to the WAN.

You currently do not have a means to translate 64:ff9b::1.2.3.4 into 1.2.3.4 and vice-versa.

Either revert pool6 (64:ff9b::/96) or add an EAM entry that does the same thing (eg. 0.0.0.0/0 | 64:ff9b::/96)

ydahhrk commented 4 years ago

For what it's worth: #339 and your original problem seem to be the same bug. I'm currently investigating further.

ydahhrk commented 4 years ago

Can you still debug this?

When jool_siit is running (jool_siit global update manually-enabled true) ipv4 access seems to break and translations according to the EAMT don't happen at all. If I ping my router 172.16.29.1 from that machine, it times out. If I ping my machine 172.16.29.6 from the router, it times out. However, as soon as I stop jool_siit (jool_siit global update manually-enabled false) all of those ipv4 pings start working again.

I now agree that this is a bug.

  1. Jool machine writes echo request 172.16.29.6 -> 172.16.29.1
  2. Router responds echo reply 172.16.29.1 -> 172.16.29.6
  3. Jool translates that into echo reply 64:ff9b::172.16.29.1 -> 64:ff9b::172.16.29.6
  4. Packet gets lost because nobody's listening at 64:ff9b::172.16.29.6 or something like that

When I wrote this, I had forgotten that Jool has an inbuilt "generic blacklist" that is supposed to prevent this from happening. The logic is "if the IPv4 packet's destination address belongs to the translator's interface, cancel translation." For some reason, this appears to not be working on your end. More unfortunately still, I cannot reproduce the problem.

To understand what Jool is thinking, we have two options:

  1. Print stats (jool_siit stats display) before the ping, then print stats again after the ping. See if an error counter increased.
  2. Enable debug logging. (Jool 4.0.7 requires a somewhat clumsy procedure to do so, but I can explain it to you if you're willing.)

Another idea pops to mind: Please post the output of ip address, or at least the section that concerns the interface that lists address 172.16.29.6. Maybe your address has some property that makes it unqualified to be described as "belonging to the translator's interface."

starcraft66 commented 4 years ago

Hey, sorry for the silence. I'm super busy with school right now because I have a final tomorrow. I will respond properly this weekend.

starcraft66 commented 4 years ago

Hey, apologies for the long delay, I am back with a lot of free time. I can certainly keep debugging this as I'd love for this setup to eventually work!

To understand what Jool is thinking, we have two options:

1. Print stats (`jool_siit stats display`) before the ping, then print stats again after the ping. See if an error counter increased.

I tried that and the stats command shows no output at all.

root@k8s-natdns64:/home/tristan# jool_siit -i siit eamt display
+---------------------------------------------+--------------------+
|                                 IPv6 Prefix |        IPv4 Prefix |
+---------------------------------------------+--------------------+
|                 2607:fa48:6ed8:8a54:3::/128 |     172.16.30.2/32 |
+---------------------------------------------+--------------------+
root@k8s-natdns64:/home/tristan# jool_siit -i siit stats display
root@k8s-natdns64:/home/tristan# # ping from the router for a few seconds (100% packet loss)
root@k8s-natdns64:/home/tristan# jool_siit -i siit stats display
root@k8s-natdns64:/home/tristan#
2. Enable debug logging. (Jool 4.0.7 requires a somewhat clumsy procedure to do so, but I can explain it to you if you're willing.)

I installed Jool 4.1.2-1 and enabled debugging using the userspace tools but I wasn't able to get any meaningful info because it seems a ton of traffic is being sent to jool and the kernel log get totally spammed from the SSH traffic. Perhaps I need to edit my iptables rules only send traffic destined to the EAMT addresses into the jool chain? What kind of debug info are we looking for?

Another idea pops to mind: Please post the output of ip address, or at least the section that concerns the interface that lists address 172.16.29.6. Maybe your address has some property that makes it unqualified to be described as "belonging to the translator's interface."

root@k8s-natdns64:/home/tristan# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether c2:6a:23:da:d4:c9 brd ff:ff:ff:ff:ff:ff
    inet 172.16.29.6/24 brd 172.16.29.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 2607:fa48:6ed8:8a51:c06a:23ff:feda:d4c9/64 scope global dynamic mngtmpaddr noprefixroute
       valid_lft 2591935sec preferred_lft 604735sec
    inet6 2607:fa48:6ed8:8a51::64/64 scope global
       valid_lft forever preferred_lft forever
    inet6 fe80::c06a:23ff:feda:d4c9/64 scope link
       valid_lft forever preferred_lft forever

Also, here are my config files if you'd like to try to reproduce this on your end:

root@k8s-natdns64:/home/tristan# cat /etc/jool/jool.conf
{
        "comment": "Configuration for the systemd NAT64 Jool service.",

        "instance": "default",
        "framework": "iptables",

        "global": {
                "comment": "NAT64 prefix",
                "pool6": "64:ff9b::/96"
        }
}
root@k8s-natdns64:/home/tristan# cat /etc/jool/jool_siit.conf
{
        "comment": "Sample full SIIT configuration.",

        "instance": "siit",
        "framework": "iptables",

        "global": {
                "comment": "pool6 and the RFC6791v4 pool belong here, ever since Jool 4.",
                "pool6": "64:ff9b::/96"
        },

        "eamt": [
                {
                        "ipv6 prefix": "2607:fa48:6ed8:8a54:3::",
                        "ipv4 prefix": "172.16.30.2"
                }
                ]
}

My iptables rules:

root@k8s-natdns64:/home/tristan# iptables-save
# Generated by iptables-save v1.8.4 on Fri Aug 28 05:13:34 2020
*mangle
:PREROUTING ACCEPT [0:0]
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
:POSTROUTING ACCEPT [0:0]
-A PREROUTING -j JOOL_SIIT --instance siit
-A PREROUTING -j JOOL --instance default
COMMIT
# Completed on Fri Aug 28 05:13:34 2020
root@k8s-natdns64:/home/tristan# ip6tables-save
# Generated by ip6tables-save v1.8.4 on Fri Aug 28 05:13:39 2020
*mangle
:PREROUTING ACCEPT [117:10783]
:INPUT ACCEPT [117:10783]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [69:9227]
:POSTROUTING ACCEPT [83:10193]
-A PREROUTING -j JOOL_SIIT --instance siit
-A PREROUTING -j JOOL --instance default
COMMIT
# Completed on Fri Aug 28 05:13:39 2020
ydahhrk commented 4 years ago

but I wasn't able to get any meaningful info because it seems a ton of traffic is being sent to jool and the kernel log get totally spammed from the SSH traffic.

You can always filter out the logging blocks that do not involve 172.16.29.6 in any way.

Perhaps I need to edit my iptables rules only send traffic destined to the EAMT addresses into the jool chain? What kind of debug info are we looking for?

Well, that would probably fix the ping too.

Which would be great for solving the problem, but not so much for debugging the bug.

Also, here are my config files if you'd like to try to reproduce this on your end:

WAIT. Wait. Waitwaitwaitwait. I just realized.

Why do you have a Stateful NAT64 Jool configuration? SIIT-DC officially relies solely on stateless translators.

-A PREROUTING -j JOOL_SIIT --instance siit
-A PREROUTING -j JOOL --instance default

Perhaps the NAT64 instance is swallowing all the traffic. This would also explain why you're getting no stats.

But it's strange. The stateless translator is listed before the stateful one, so I'd expect the former to have more priority.

~Also: From your original post I had originally understood that everything was working fine with iptables, then got broken when you switched to Netfilter. But now you're saying that the ping is also not working with iptables. Can you please clarify?~ This doesn't matter anymore.

Thank you. I'll try to reproduce this again with the new information, and will hopefully have more questions later.

ydahhrk commented 4 years ago

Found the cause of the loss of ping between the router and the translator. I just applied the patch to master.

Can you test it?

(I still think you should get rid of the Stateful NAT64 instance.)

ydahhrk commented 4 years ago

Just for the sake of completeness, here's a checklist of the routing I configured to make it work (in addition to patching the code):

The Vyos machine needs to route 64:ff9b:: and 172.16.30.2 through the translator:

me@vyos:~$ sudo ip route add 64:ff9b::/96 via 2607:fa48:6ed8:8a51::64
me@vyos:~$ # I'm assuming the entire 172.16.30 network is reserved for EAMT usage,
me@vyos:~$ # but I might be overdoing it. But whatever.
me@vyos:~$ sudo ip route add 172.16.30.0/24 via 172.16.29.6

The Vyos machine needs forwarding enabled:

me@vyos:~$ sudo sysctl -w net.ipv4.conf.all.forwarding=1
me@vyos:~$ sudo sysctl -w net.ipv6.conf.all.forwarding=1

And so does the translator (though it's not as crucial):

me@k8s-natdns64:~$ sudo sysctl -w net.ipv4.conf.all.forwarding=1
me@k8s-natdns64:~$ sudo sysctl -w net.ipv6.conf.all.forwarding=1

The Kubernetes machine needs a 64:ff9b::/96 route towards Vyos. In my case, I just defaulted it:

me@kubernetes:~$ sudo ip route add default via 2607:fa48:6ed8:8a54:1::

Did the same for the translator, for both protocols:

me@k8s-natdns64:~$ sudo ip route add default via 2607:fa48:6ed8:8a51::1
me@k8s-natdns64:~$ sudo ip route add default via 172.16.30.1

With this configuration, I was able to perform the following pings from vyos. Sniffing the traffic, I didn't notice anything out of place:

me@vyos:~$ ping 172.16.29.6 # Answered by the translator
me@vyos:~$ ping 172.16.30.2 # Answered by kubernetes

I think that's all.

starcraft66 commented 4 years ago

Found the cause of the loss of ping between the router and the translator. I just applied the patch to master.

Can you test it?

I just cloned Jool master, compiled and installed the userspace tools and kernel module via dkms and rebooted the machine.

Just for the sake of completeness, here's a checklist of the routing I configured to make it work (in addition to patching the code):

I quickly skimmed over this and everything is exactly like my setup. I also disabled the stateful NAT64 translator for the time being via systemctl stop jool.

Unfortunately, the connectivity issue is not resolved. However, I did a bunch of tcpdumping and can confirm that the issue seems to be exactly what you speculated above when you wrote:

Jool machine writes echo request 172.16.29.6 -> 172.16.29.1 Router responds echo reply 172.16.29.1 -> 172.16.29.6 Jool translates that into echo reply 64:ff9b::172.16.29.1 -> 64:ff9b::172.16.29.6 Packet gets lost because nobody's listening at 64:ff9b::172.16.29.6 or something like that

If I ping my router from the jool box, the the ping is sent to vyos over ipv4, vyos responds over ipv4 and sends it back to the jool box. Jool then translates 172.16.29.1 to 64:ff9b::172.16.29.1 and an icmp6 reply is received at 64:ff9b::172.16.29.6 and nothing happens because ping is expecting a response at 172.16.29.6.

ydahhrk commented 4 years ago

Did you uninstall the previous v4.1.2 version?

If you installed one of them from the .deb package, and the other from the code, both will exist in your system and one of them will have precedence over the other. So it's possible you're still running old code.

I just uploaded a commit which bumps Jool's version number from 4.1.2.0 to 4.1.2.1.

Try uninstalling the old version, install this new one, and make sure that it prints the intended version number (both in jool_siit --version and in dmesg after the modprobe). Then test again.

starcraft66 commented 4 years ago

I think I had the wrong version installed. I uninstalled the jool-dkms package and build and installed the kernel module from source but now I get a version mismatch error.

root@k8s-natdns64:~/Jool# jool_siit -i siit eamt display
+---------------------------------------------+--------------------+
|                                 IPv6 Prefix |        IPv4 Prefix |
+---------------------------------------------+--------------------+
Error: The kernel module returned error 22: Version mismatch. The userspace client's version is 4.1.2.1,
but the kernel module is 4.1.2.0.
Please update the kernel module.

I rolled back to the commit before you bumped the version number and made sure the right DKMS module was installed and everything works now! I even turned back on the stateful NAT64 translator just to test and both can coexist just fine on the same box. Thanks for helping me out, I think we can close this issue once the kernel module's version number is bumped.

ydahhrk commented 4 years ago

Thanks for the feedback!

Currently releasing 4.1.3; closing.