kernel 6.x update breaks proxy NDP

nivekuil commented 1 year ago

Describe the bug

It's typical for VPS providers to assign an unrouted ipv6 prefix. We can still give each pod/container its own reachable ipv6 address by having the kernel proxy NDP requests to those IPs. This approach worked well but stopped working with this update.

For context it's common to use something like ndppd or https://yoursunny.com/t/2021/ndpresponder to automate this as we can't proxy an entire subnet (I think it's to prevent attackers from blowing up the router's neighbor cache), but I have my own higher level automation so I just run the iproute2 commands manually.

Reproduction steps

This is tough to repro. I have this script run as an OCI hook before each podman container start.

#!/bin/sh
in=$(</dev/stdin)
ip=$(echo $in | jq -r '.annotations | .[\"hatchery.pod.ip\"]')
if=$(/usr/sbin/ip -6 r | awk '/^default/ {printf $5}')
/usr/sbin/ip -6 neigh add proxy $ip dev $if

This tells the kernel to proxy ndp for this pod (IP learned from the annotation on the pod). It's worked reliably until this update. Services will work for a few hours after a restart and then become unreachable. I also tried ndpresponder mentioned above, which has worked in previous deployments, but that didn't seem to work now either.

Expected behavior

Pods reachable over ipv6, no regression.

Actual behavior

Pods become unreachable after a few hours (upstream ndp cache expiry?)

System details

State: idle
AutomaticUpdatesDriver: Zincati
  DriverState: active; periodically polling for updates (last checked Sat 2022-11-05 18:10:33 UTC)
BootedDeployment:
● fedora:fedora/x86_64/coreos/next
                  Version: 37.20221031.1.0 (2022-11-02T09:38:17Z)
                   Commit: a153c4842ff66725f3fee4a2c6d77461a804c9dd49adffdbef7160b111ef66c1
             GPGSignature: Valid signature by ACB5EE4E831C74BB7C168D27F55AD3FB5323552A

Ignition config

No response

Additional information

No response

dustymabe commented 1 year ago

I can only suspect this is related to the kernel update:

kernel 5.19.16-301.fc37.x86_64 → 6.0.5-300.fc37.x86_64

If you have node you can play around on you can try to be on the latest update but override the kernel to the old version with something like:

sudo rpm-ostree override replace https://bodhi.fedoraproject.org/updates/FEDORA-2022-1c6a1ca835

and let us know if that works. Ultimately we'll probably have to file a kernel bug and try to notify the upstream kernel maintainers of the regression. It's also possible the issue has already been found and fixed, but it will take some digging.

nivekuil commented 1 year ago

I think it's the kernel update. I set up an experiment with 3 nodes. One is on stable, one on next (Linux dro-3 6.0.5-300.fc37.x86_64 #1 SMP PREEMPT_DYNAMIC Wed Oct 26 17:24:18 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux) and one on next having run the command you provided (Linux dro-1 5.19.16-301.fc37.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Oct 21 15:55:37 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux). The 6.0 node just had a service become unreachable (mtr stops two hops before a normal service, 1 for the host and 1 for the pod).

dustymabe commented 1 year ago

hey @nivekuil - just to exhaust the remaining options:

can you try with testing (36.20221030.2.3), which also has the 6.0.5 kernel, but is on a Fedora 36 package set?
can you try with the latest next (37.20221106.1.0) which has a slightly newer kernel (kernel-6.0.7-301.fc37) than the one you were trying with before.

tohojo commented 1 year ago

It looks like this is a kernel regression introduced in the 6.0 kernel; there's a fix posted on the upstream mailing list, here: https://lore.kernel.org/all/Y295+9+JDjqRWbwU@x1.ze-it.at/

The failure description in that patch message sounds like it's consistent with the symptoms described in this issue: namely, that NDP stops working after a while...

nivekuil commented 1 year ago

Found out the hard way this made it into stable, guess I'm overriding the kernel for now. Would setting rollout_wariness in zincati have saved me here?

lucab commented 1 year ago

@nivekuil not really, it would have just delayed it to the end of the rollout window (i.e. 2022-11-17 evening). In these case it is likely better to temporarily disable zincati.service or to configure it to not perform auto-updates.

dustymabe commented 1 year ago

Hey @nivekuil - can you test with this scratch build that has the proposed upstream fix in it?

sudo rpm-ostree override replace \
https://kojipkgs.fedoraproject.org//work/tasks/2010/94312010/kernel-{,core-,modules-}6.0.9-300.fc37.x86_64.rpm

nivekuil commented 1 year ago

So far so good

dustymabe commented 1 year ago

The proposed upstream fix was pulled in to Fedora's source branches for kernel 6.0. The next Fedora kernel build will include the patch, but we don't know if that will make it into next week's FCOS testing release. Will keep this issue posted.

Note: This backport will only apply for 6.0 builds. The hope is that this patch makes it into 6.1 and doesn't need a backport. We'll re-evaluate at that point to see if it's required or not.

dustymabe commented 1 year ago

The commit made it into v6.1-rc7, so it should safely be in 6.1 when it is released.

The backport referenced in https://github.com/coreos/fedora-coreos-tracker/issues/1337#issuecomment-1324322562 done by the Fedora kernel team should be in kernel-6.0.10-300.fc37 and newer, which should be in the next testing release.

nivekuil commented 1 year ago

The test machine on 6.0.9-300.fc37 just had all its ipv6 containers become unreachable, same as the original issue but it lasted much longer. So I guess there's more going on. Reverting it to 5.19.

dustymabe commented 1 year ago

hey @nivekuil - any chance you could email the commit author for 8207f25 and see if maybe you can help figure out the remaining issue?

dustymabe commented 1 year ago

Based on the feedback from @nivekuil, I'm going to kick this out of "fixed" status so I won't report it as being fixed with the latest round of releases.

coreos / fedora-coreos-tracker