aws / amazon-vpc-cni-k8s

Networking plugin repository for pod networking in Kubernetes using Elastic Network Interfaces on AWS
Apache License 2.0
2.27k stars 738 forks source link

Pod routing policies deleted by systemd #1600

Closed hligit closed 2 years ago

hligit commented 3 years ago

What happened:

Pod outgoing packets are routed via interface eth0 because routing policies for the pod are deleted by systemd-networkd.

Detailed timeline of the relevant events:

Attach Logs:

There are 2 secondary ENIs attached, pod with IP 10.7.4.69 are assigned to eth1, but the pod's routing policies are missing.

core@ip-10-1-58-66 ~ $ ip rule list
0:  from all lookup local
512:    from all to 10.7.4.160 lookup main
512:    from all to 10.7.4.83 lookup main
512:    from all to 10.7.4.89 lookup main
512:    from all to 10.7.4.209 lookup main
512:    from all to 10.7.4.213 lookup main
512:    from all to 10.7.4.163 lookup main
512:    from all to 10.7.4.11 lookup main
1536:   from 10.7.4.160 lookup 3
1536:   from 10.7.4.83 lookup 3
1536:   from 10.7.4.89 lookup 3
1536:   from 10.7.4.209 lookup 3
1536:   from 10.7.4.213 lookup 3
1536:   from 10.7.4.163 lookup 3
1536:   from 10.7.4.11 lookup 3
32766:  from all lookup main
32767:  from all lookup default

From aws cni plugins.log, the policies were added.

{"level":"info","ts":"2021-09-01T06:53:52.861Z","caller":"driver/driver.go:178","msg":"Added toContainer rule for 10.7.4.69/32"}
{"level":"info","ts":"2021-09-01T06:53:52.861Z","caller":"driver/driver.go:178","msg":"Added rule priority 1536 from 10.7.4.69/32 table 2"}

From systemd-networkd log, the routing policies were deleted after eth2 was added.

Sep 01 06:53:58 ip-10-1-58-66.ap-southeast-2.compute.internal systemd-networkd[1260]: eth2: Link 15 added
Sep 01 06:53:58 ip-10-1-58-66.ap-southeast-2.compute.internal systemd-networkd[1260]: eth2: udev initialized link
Sep 01 06:53:58 ip-10-1-58-66.ap-southeast-2.compute.internal systemd-networkd[1260]: eth2: State changed: pending -> initialized
Sep 01 06:53:58 ip-10-1-58-66.ap-southeast-2.compute.internal systemd-networkd[1260]: Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/network1/link/_315 interface=org.freedesktop.DBus.Properties m
ember=PropertiesChanged cookie=73 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a
Sep 01 06:53:58 ip-10-1-58-66.ap-southeast-2.compute.internal systemd-networkd[1260]: eth2: Saved original MTU: 1500
Sep 01 06:53:58 ip-10-1-58-66.ap-southeast-2.compute.internal systemd-networkd[1260]: eth2: Link state is up-to-date
Sep 01 06:53:58 ip-10-1-58-66.ap-southeast-2.compute.internal systemd-networkd[1260]: eth2: found matching network '/etc/systemd/network/01-eth.network'
Sep 01 06:53:58 ip-10-1-58-66.ap-southeast-2.compute.internal systemd-networkd[1260]: Setting '/proc/sys/net/ipv6/conf/eth2/disable_ipv6' to '0'
Sep 01 06:53:58 ip-10-1-58-66.ap-southeast-2.compute.internal systemd-networkd[1260]: Setting '/proc/sys/net/ipv4/ip_forward' to '1'
Sep 01 06:53:58 ip-10-1-58-66.ap-southeast-2.compute.internal systemd-networkd[1260]: Setting '/proc/sys/net/ipv6/conf/all/forwarding' to '1'
Sep 01 06:53:58 ip-10-1-58-66.ap-southeast-2.compute.internal systemd-networkd[1260]: Setting '/proc/sys/net/ipv6/conf/eth2/use_tempaddr' to '0'
Sep 01 06:53:58 ip-10-1-58-66.ap-southeast-2.compute.internal systemd-networkd[1260]: Setting '/proc/sys/net/ipv6/conf/eth2/accept_ra' to '0'
Sep 01 06:53:58 ip-10-1-58-66.ap-southeast-2.compute.internal systemd-networkd[1260]: eth2: Setting nomaster
Sep 01 06:53:58 ip-10-1-58-66.ap-southeast-2.compute.internal systemd-networkd[1260]: eth2: Setting address genmode for link
Sep 01 06:53:58 ip-10-1-58-66.ap-southeast-2.compute.internal systemd-networkd[1260]: eth2: Failed to read sysctl property stable_secret: Input/output error
Sep 01 06:53:58 ip-10-1-58-66.ap-southeast-2.compute.internal systemd-networkd[1260]: eth2: Setting nomaster done.
Sep 01 06:53:58 ip-10-1-58-66.ap-southeast-2.compute.internal systemd-networkd[1260]: eth2: Setting address genmode done.
Sep 01 06:53:58 ip-10-1-58-66.ap-southeast-2.compute.internal systemd-networkd[1260]: Removing routing policy rule: priority: 1536, 10.7.4.69/32 -> 0.0.0.0/0, iif: n/a, oif: n/a, table: 2
Sep 01 06:53:58 ip-10-1-58-66.ap-southeast-2.compute.internal systemd-networkd[1260]: Removing routing policy rule: priority: 1536, 10.7.4.58/32 -> 0.0.0.0/0, iif: n/a, oif: n/a, table: 2
Sep 01 06:53:58 ip-10-1-58-66.ap-southeast-2.compute.internal systemd-networkd[1260]: Removing routing policy rule: priority: 1536, 10.7.4.166/32 -> 0.0.0.0/0, iif: n/a, oif: n/a, table: 2

How to reproduce it (as minimally and precisely as possible):

Schedule pods on a worker node with systemd v247.4 or newer, until two secondary ENIs are attached. Run ip rule list and check if routing policies for pods associated to the first secondary ENI are deleted.

Anything else we need to know?:

Mitigation:

A network configure as below can instruct systemd to not delete the routing policies added by AWS CNI.

[Match]
Name=eth*

[Network]
KeepConfiguration=yes

Environment:

jayanthvn commented 3 years ago

Hi @hligit

Thanks for the details. Regarding the mitigation I feel it would be better to update our documentation.

hligit commented 3 years ago

I found systemd introduced a new configuration ManageForeighRoutingPolicyRules which would be the proper fix. According to the maintainer, this new configuration will be backported to v247 and v248. https://github.com/systemd/systemd/pull/19287#issuecomment-910955617

jayanthvn commented 3 years ago

Nice thanks for letting us know @hligit. I will update the docs accordingly.

smalltown commented 3 years ago

I also encounter the same issue after upgrade FlatCar CoreOS recently, below are my two customized systemd-networkd configurations to workaround it (Only verified and worked in FlatCar CoreOS, not Fedora CoreOS)

If FlatCar official upgrade systemd including the ManageForeighRoutingPolicyRules feature, I will post the new systemd-networkd configuration

hligit commented 3 years ago

Thanks @smalltown! Your eni* link configuration is quite nice that it works with current version of systemd. We have below configuration on Flatcar with a patched systemd.

[Network]
ManageForeignRoutes=no
ManageForeignRoutingPolicyRules=no
pothos commented 3 years ago

I don't think that a global ManageForeignRout... entry is the right answer because systemd-networkd can still interfere with the manually configured network, depending on whether the default network configuration tries to use DHCP, DHCPv6 or configures another option that prevents proper connectivity. Speaking for the Flatcar Container Linux team I urge you to generate a networkd unit file under /run/systemd/network/ on the host where you set the network interface in question to Unmanaged=yes - only this is a safe and reliable solution. Currently on Flatcar we ship rules for Calico, Cilium and so on because things happened there, too. But with a generic name like eth1 we can't ship a rule on the image, so please try to generate the networkd unit file which you can do from a privileged Pod either through entering the host mount namespace with nsenter or by bind-mounting the folder into the container.

hligit commented 3 years ago

Thanks @pothos for chiming in! I tested below network unit file on Flatcar v2905.2.3 and it doesn't seem to work.

core@ip-10-1-58-125 ~ $ cat /etc/systemd/network/10-awscni.network
[Match]
Name=eni*

[Link]
Unmanaged=yes

systemd-networkd debug log shows the policies are still removed.

core@ip-10-1-58-125 ~ $ journalctl -u systemd-networkd |  grep  -E '(10.7.4.124|enid57ead4665a)'
Sep 27 20:58:51 ip-10-1-58-125.ap-southeast-2.compute.internal systemd-networkd[1292]: enid57ead4665a: New device has no master, continuing without
Sep 27 20:58:51 ip-10-1-58-125.ap-southeast-2.compute.internal systemd-networkd[1292]: enid57ead4665a: Flags change: +MULTICAST +BROADCAST
Sep 27 20:58:51 ip-10-1-58-125.ap-southeast-2.compute.internal systemd-networkd[1292]: enid57ead4665a: Link 16 added
Sep 27 20:58:51 ip-10-1-58-125.ap-southeast-2.compute.internal systemd-networkd[1292]: enid57ead4665a: link pending udev initialization...
Sep 27 20:58:51 ip-10-1-58-125.ap-southeast-2.compute.internal systemd-networkd[1292]: enid57ead4665a: Saved original MTU: 9001
Sep 27 20:58:51 ip-10-1-58-125.ap-southeast-2.compute.internal systemd-networkd[1292]: enid57ead4665a: Flags change: +UP +LOWER_UP +RUNNING
Sep 27 20:58:51 ip-10-1-58-125.ap-southeast-2.compute.internal systemd-networkd[1292]: enid57ead4665a: Link UP
Sep 27 20:58:51 ip-10-1-58-125.ap-southeast-2.compute.internal systemd-networkd[1292]: enid57ead4665a: Gained carrier
Sep 27 20:58:51 ip-10-1-58-125.ap-southeast-2.compute.internal systemd-networkd[1292]: enid57ead4665a: Remembering route: dst: 10.7.4.124/32, src: n/a, gw: n/a, prefsrc: n/a, scope: link, table: main, proto: boot, type: unicast
Sep 27 20:58:51 ip-10-1-58-125.ap-southeast-2.compute.internal systemd-networkd[1292]: Remembering foreign routing policy rule: priority: 512, 0.0.0.0/0 -> 10.7.4.124/32, iif: n/a, oif: n/a, table: 254
Sep 27 20:58:51 ip-10-1-58-125.ap-southeast-2.compute.internal systemd-networkd[1292]: Remembering foreign routing policy rule: priority: 1536, 10.7.4.124/32 -> 0.0.0.0/0, iif: n/a, oif: n/a, table: 2
Sep 27 20:58:51 ip-10-1-58-125.ap-southeast-2.compute.internal systemd-networkd[1292]: enid57ead4665a: udev initialized link
Sep 27 20:58:51 ip-10-1-58-125.ap-southeast-2.compute.internal systemd-networkd[1292]: enid57ead4665a: State changed: pending -> initialized
Sep 27 20:58:51 ip-10-1-58-125.ap-southeast-2.compute.internal systemd-networkd[1292]: enid57ead4665a: Link state is up-to-date
Sep 27 20:58:51 ip-10-1-58-125.ap-southeast-2.compute.internal systemd-networkd[1292]: enid57ead4665a: found matching network '/etc/systemd/network/10-awscni.network'
Sep 27 20:58:51 ip-10-1-58-125.ap-southeast-2.compute.internal systemd-networkd[1292]: enid57ead4665a: State changed: initialized -> unmanaged
Sep 27 20:58:59 ip-10-1-58-125.ap-southeast-2.compute.internal systemd-networkd[1292]: Removing routing policy rule: priority: 512, 0.0.0.0/0 -> 10.7.4.124/32, iif: n/a, oif: n/a, table: 254
Sep 27 20:58:59 ip-10-1-58-125.ap-southeast-2.compute.internal systemd-networkd[1292]: Removing routing policy rule: priority: 1536, 10.7.4.124/32 -> 0.0.0.0/0, iif: n/a, oif: n/a, table: 2
Sep 27 20:58:59 ip-10-1-58-125.ap-southeast-2.compute.internal systemd-networkd[1292]: Forgetting routing policy rule: priority: 512, 0.0.0.0/0 -> 10.7.4.124/32, iif: n/a, oif: n/a, table: 254
Sep 27 20:58:59 ip-10-1-58-125.ap-southeast-2.compute.internal systemd-networkd[1292]: Forgetting routing policy rule: priority: 1536, 10.7.4.124/32 -> 0.0.0.0/0, iif: n/a, oif: n/a, table: 2

While the route exists,

core@ip-10-1-58-125 ~ $ ip route | grep enid57ead4665a
10.7.4.124 dev enid57ead4665a scope link
core@ip-10-1-58-125 ~ $ ip link show enid57ead4665a
16: enid57ead4665a@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP mode DEFAULT group default
    link/ether 3e:30:1b:16:25:74 brd ff:ff:ff:ff:ff:ff link-netns cni-23d2c957-1af1-3133-5aeb-6153e4b7093e
pothos commented 3 years ago

Ok, funny, so I guess we either need Unmanaged plus the additional global setting (not sure if it's a good idea to set it automatically or if the distro or the user would be in charge), or we could try to generate a valid network unit file that configures the routes and policies and turns off everything that is not needed (DHCP=no, LinkLocalAddressing=no, RequiredForOnline=no, Scope=link etc).

github-actions[bot] commented 2 years ago

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days

pothos commented 2 years ago

On Flatcar the global systemd settings are now set by default to work around the problem. The interface should also be set Unmanaged now by default by I didn't double-check it for this CNI.

FYI; here almost the same case in another CNI, with some instructions on how your CNI could generate a networkd unit and maintain it during the runtime to prevent requiring the user to set up the global systemd settings: https://github.com/cilium/cilium/issues/18706#issuecomment-1066986342

smalltown commented 2 years ago

After testing the FlatCar CoreOS version 3033.2.3, I found amazon vpc cni can exclude this issue, the default configuration of ManageForeignRoutes and ManageForeignRoutingPolicyRules works right now

But I found the iptables command of version 3033.2.0 uses the nftables kernel backend instead of the iptables backend, that leads amazon vpc cni broken again, the workaround could refer to issue #1847

github-actions[bot] commented 2 years ago

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days

github-actions[bot] commented 2 years ago

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days

github-actions[bot] commented 2 years ago

Issue closed due to inactivity.