bottlerocket-os / bottlerocket

An operating system designed for hosting containers
https://bottlerocket.dev
Other
8.59k stars 506 forks source link

How to disable IPv6 DAD to reduce startup delay of pods on IPv6 cluster #3878

Open woehrl01 opened 5 months ago

woehrl01 commented 5 months ago

What I'd like:

We want to change the sysctl value net.ipv6.conf.all.optimistic_dad=1

We created a bootstrap container executing the following script:

#!/bin/bash

set -ex

nsenter -t 1 -m sysctl -w net.ipv6.conf.all.optimistic_dad=1

but it fails with:

+ nsenter -t 1 -m sysctl -w net.ipv6.conf.all.optimistic_dad=1
nsenter: cannot open /proc/1/ns/mnt: Permission denied

How to change that?

Any alternatives you've considered:

None that I'm aware of. Executing that from an admin-container via sheltie changes that value, successfully.

Related to: https://github.com/aws/amazon-vpc-cni-k8s/pull/1631

woehrl01 commented 5 months ago

Apologies, the right way to set this is:

[settings.kernel.sysctl]
"net.ipv6.conf.all.optimistic_dad" = "1"
"net.ipv6.conf.default.optimistic_dad" = "1"
larvacea commented 5 months ago

Thank you for your update. I would love to know if (as I hope) you see measurably faster startup with optimistic duplicate address detection.

woehrl01 commented 5 months ago

@larvacea Unfortunately changing optimistic_dad = 1 or accept_dad = 0 regardless of the interface does not have any impact on the startup latency. There is currently still a 2-3 second delay on a IPv6 pod startup (compared to ipv4). I can confirm that the value is picked up by the vethd* interfaces, created for the sandboxes.

bcressey commented 4 months ago

@larvacea @woehrl01 a couple of other ideas:

Optimistic DAD might need to be combined with "use_optimistic", in order to actually make use of the tentative addresses. Also, given the evidence that DAD is being performed despite accept_dad = 0, we could try setting dad_transmits = 0 to override it:

[settings.kernel.sysctl]
# don't enable DAD 
"net.ipv6.conf.all.accept_dad" = "0"
"net.ipv6.conf.default.accept_dad" = "0"

# don't transmit any DAD probes
"net.ipv6.conf.all.dad_transmits" = "0"
"net.ipv6.conf.default.dad_transmits" = "0"

# if we end up using DAD, go ahead and use the tentative addresses
"net.ipv6.conf.all.optimistic_dad" = "1"
"net.ipv6.conf.all.use_optimistic" = "1"
woehrl01 commented 4 months ago

Thank you @bcressey I just tried your configuration and also additonal permutations, the startup delay of around 2 second still persist:

IPv4 Cluster:

Bildschirmfoto 2024-04-26 um 20 07 54

IPv6 Cluster (with the settings):

Bildschirmfoto 2024-04-26 um 20 07 13

bash-5.1# tail -n +1 /proc/sys/net/ipv6/conf/*/*dad*
==> /proc/sys/net/ipv6/conf/all/accept_dad <==
0

==> /proc/sys/net/ipv6/conf/all/dad_transmits <==
0

==> /proc/sys/net/ipv6/conf/all/enhanced_dad <==
1

==> /proc/sys/net/ipv6/conf/all/optimistic_dad <==
1

==> /proc/sys/net/ipv6/conf/default/accept_dad <==
0

==> /proc/sys/net/ipv6/conf/default/dad_transmits <==
0

==> /proc/sys/net/ipv6/conf/default/enhanced_dad <==
1

==> /proc/sys/net/ipv6/conf/default/optimistic_dad <==
1

==> /proc/sys/net/ipv6/conf/eni0dfdceb3448/accept_dad <==
0

==> /proc/sys/net/ipv6/conf/eni0dfdceb3448/dad_transmits <==
0

==> /proc/sys/net/ipv6/conf/eni0dfdceb3448/enhanced_dad <==
1

==> /proc/sys/net/ipv6/conf/eni0dfdceb3448/optimistic_dad <==
1

==> /proc/sys/net/ipv6/conf/eni19314c3cd96/accept_dad <==
0

==> /proc/sys/net/ipv6/conf/eni19314c3cd96/dad_transmits <==
0

==> /proc/sys/net/ipv6/conf/eni19314c3cd96/enhanced_dad <==
1

==> /proc/sys/net/ipv6/conf/eni19314c3cd96/optimistic_dad <==
1

==> /proc/sys/net/ipv6/conf/eni79b4cbaf095/accept_dad <==
0

==> /proc/sys/net/ipv6/conf/eni79b4cbaf095/dad_transmits <==
0

==> /proc/sys/net/ipv6/conf/eni79b4cbaf095/enhanced_dad <==
1

==> /proc/sys/net/ipv6/conf/eni79b4cbaf095/optimistic_dad <==
1

==> /proc/sys/net/ipv6/conf/eni8d1aa624f0c/accept_dad <==
0

==> /proc/sys/net/ipv6/conf/eni8d1aa624f0c/dad_transmits <==
0

==> /proc/sys/net/ipv6/conf/eni8d1aa624f0c/enhanced_dad <==
1

==> /proc/sys/net/ipv6/conf/eni8d1aa624f0c/optimistic_dad <==
1

==> /proc/sys/net/ipv6/conf/eni8f2e97e2322/accept_dad <==
0

==> /proc/sys/net/ipv6/conf/eni8f2e97e2322/dad_transmits <==
0

==> /proc/sys/net/ipv6/conf/eni8f2e97e2322/enhanced_dad <==
1

==> /proc/sys/net/ipv6/conf/eni8f2e97e2322/optimistic_dad <==
1

==> /proc/sys/net/ipv6/conf/enid559aefed0e/accept_dad <==
0

==> /proc/sys/net/ipv6/conf/enid559aefed0e/dad_transmits <==
0

==> /proc/sys/net/ipv6/conf/enid559aefed0e/enhanced_dad <==
1

==> /proc/sys/net/ipv6/conf/enid559aefed0e/optimistic_dad <==
1

==> /proc/sys/net/ipv6/conf/enie114b69e62e/accept_dad <==
0

==> /proc/sys/net/ipv6/conf/enie114b69e62e/dad_transmits <==
0

==> /proc/sys/net/ipv6/conf/enie114b69e62e/enhanced_dad <==
1

==> /proc/sys/net/ipv6/conf/enie114b69e62e/optimistic_dad <==
1

==> /proc/sys/net/ipv6/conf/eth0/accept_dad <==
0

==> /proc/sys/net/ipv6/conf/eth0/dad_transmits <==
0

==> /proc/sys/net/ipv6/conf/eth0/enhanced_dad <==
1

==> /proc/sys/net/ipv6/conf/eth0/optimistic_dad <==
0

==> /proc/sys/net/ipv6/conf/lo/accept_dad <==
-1

==> /proc/sys/net/ipv6/conf/lo/dad_transmits <==
1

==> /proc/sys/net/ipv6/conf/lo/enhanced_dad <==
1

==> /proc/sys/net/ipv6/conf/lo/optimistic_dad <==
0

==> /proc/sys/net/ipv6/conf/veth1003d20a/accept_dad <==
0

==> /proc/sys/net/ipv6/conf/veth1003d20a/dad_transmits <==
0

==> /proc/sys/net/ipv6/conf/veth1003d20a/enhanced_dad <==
1

==> /proc/sys/net/ipv6/conf/veth1003d20a/optimistic_dad <==
1

==> /proc/sys/net/ipv6/conf/veth1de24e44/accept_dad <==
0

==> /proc/sys/net/ipv6/conf/veth1de24e44/dad_transmits <==
0

==> /proc/sys/net/ipv6/conf/veth1de24e44/enhanced_dad <==
1

==> /proc/sys/net/ipv6/conf/veth1de24e44/optimistic_dad <==
1

==> /proc/sys/net/ipv6/conf/veth2df40afd/accept_dad <==
0

==> /proc/sys/net/ipv6/conf/veth2df40afd/dad_transmits <==
0

==> /proc/sys/net/ipv6/conf/veth2df40afd/enhanced_dad <==
1

==> /proc/sys/net/ipv6/conf/veth2df40afd/optimistic_dad <==
1

==> /proc/sys/net/ipv6/conf/veth4024dff7/accept_dad <==
0

==> /proc/sys/net/ipv6/conf/veth4024dff7/dad_transmits <==
0

==> /proc/sys/net/ipv6/conf/veth4024dff7/enhanced_dad <==
1

==> /proc/sys/net/ipv6/conf/veth4024dff7/optimistic_dad <==
1

==> /proc/sys/net/ipv6/conf/veth524efe7e/accept_dad <==
0

==> /proc/sys/net/ipv6/conf/veth524efe7e/dad_transmits <==
0

==> /proc/sys/net/ipv6/conf/veth524efe7e/enhanced_dad <==
1

==> /proc/sys/net/ipv6/conf/veth524efe7e/optimistic_dad <==
1

==> /proc/sys/net/ipv6/conf/vetha8d8bf98/accept_dad <==
0

==> /proc/sys/net/ipv6/conf/vetha8d8bf98/dad_transmits <==
0

==> /proc/sys/net/ipv6/conf/vetha8d8bf98/enhanced_dad <==
1

==> /proc/sys/net/ipv6/conf/vetha8d8bf98/optimistic_dad <==
1

==> /proc/sys/net/ipv6/conf/vethce2339e3/accept_dad <==
0

==> /proc/sys/net/ipv6/conf/vethce2339e3/dad_transmits <==
0

==> /proc/sys/net/ipv6/conf/vethce2339e3/enhanced_dad <==
1

==> /proc/sys/net/ipv6/conf/vethce2339e3/optimistic_dad <==
1
woehrl01 commented 4 months ago

Digging through some code around the web, I came past the following implementation in Android: https://android.googlesource.com/platform/frameworks/base/+/befe778%5E%21/#F0

It looks like that if optimistic_dad is enabled the IFA_F_TENTATIVE is set together with IFA_F_OPTIMISTIC resulting in the following check in the AWS VPC CNI to still fail until the DAD has succeeded: https://github.com/aws/amazon-vpc-cni-k8s/pull/1631/files#diff-afc7977e1f00abb3f66455a7d491ded671d38ffa43e0dc910606084ec4fd4841R250-R255

Still not sure why IFA_F_TENTATIVE is set when DAD is disabled. But I located the following (fixed) issue on Red Hat setting the address to tentative even if dad_transmits=0: https://bugzilla.redhat.com/show_bug.cgi?id=709271

edit:

I have some additional findings. Running the following script on a node with the above settings, clearly shows that there are no interfaces created in the tentative state. With default settings, the interfaces are shown in that state.

for i in {1..1000}; do ip -6 addr show | grep "tentative"; sleep 0.1; done
KCSesh commented 3 months ago

@woehrl01 based on your last update, this seems to be expected behavior, right?

I have some additional findings. Running the following script on a node with the above settings, clearly shows that there are no interfaces created in the tentative state. With default settings, the interfaces are shown in that state.

Do you mind clarifying the open request if one still exists?

woehrl01 commented 3 months ago

@KCSesh there is a behaviour I don't understand. As the cni plugin clearly waits for 2 seconds in a tentative state even though DAD is disabled.

So the question is. Are there additional configurations which need to be applied to fully disable DAD, so that all interfaces are directly stable?

vyaghras commented 2 months ago

@woehrl01 I created a IPV6 Bottlerocket cluster with following configurations:

[settings.kernel.sysctl]
# don't enable DAD 
"net.ipv6.conf.all.accept_dad" = "0"
"net.ipv6.conf.default.accept_dad" = "0"

# use initial net namespace IPv6 settings for new namespaces
"net.core.devconf_inherit_init_net" = "1"

This reduces the time in pod creation from Pulled to Scheduled step from 2-3 seconds to 0-1 seconds and disables DAD.

woehrl01 commented 1 month ago

@vyaghras Thank you, I can confirm that adding net.core.devconf_inherit_init_net works to successfully disable DAD.