Closed essh closed 1 year ago
v20230501
is available in all regions now! Update to the latest EKS Optimized AMIs and this issue should be resolved.
Is the kernel fix actually fixing the bug for good or is it just bumping the default bpf jit memory limit? can you provide a link to the patch?
thanks @ljosyula @mmerkes @cartermckinnon @q2ven
https://github.com/awslabs/amazon-eks-ami/releases/tag/v20230501
We are still seeing this issue across our prod cluster after upgrading to 1.24.
Increasing the bpf_jit_limit
does indeed fix the issue for pods stuck in Pending
due to this seccomp issue. But it is, at best, only a temporary fix, and after a few days the new limit again gets saturated due to the underlying memory leak and we are faced with this problem again.
Looks like we are going to have to downgrade to kernel 5.4 as the only option for now.
That said, can we re-open this issue?
I see from the commits referenced to this issue that AWS team is still trying to wrangle/fix this bug, and so obviously this is still actively being worked on. But this issue being in Closed
state falsely gives the impression that it has been fixed.
(Also, I am surprised that not more people are reporting this? Do we just have an unnatural amount of liveness/health check probes or something?)
@skupjoe We haven't had any reports of this since https://github.com/awslabs/amazon-eks-ami/issues/1219#issuecomment-1534536682, can you confirm the kernel version you're on?
I downgraded from 5.10 to 5.4.283-195.378.amzn2.x86_64
and unfortunately it is still happening after about ~3.5 days of uptime. It typically happens to ~3 nodes at a time in a ~10-node cluster.
Increasing the bpf_jit_limit
immediately fixed the Pending
status and helps things for another ~3 days, but then the issue comes back.
It seems to happen on instances of any size/type- I am currently looking at it happening on a c6a.large
, a m5a.8xlarge
, and a m5a.2xlarge
node.
I am desperate to get this fixed and it has been happening to me ever since our EKS 1.24 upgrade. And now I am on 1.27 and will be upgrading to 1.28 tonight.
We are not using PSP and I don't see any seccomp config set anywhere at the pod level or on my k8s node config:
[root@ip-10-0-82-217 /]# sudo cat /etc/containerd/config.toml | grep -i seccomp
But the kernel supports it:
[root@ip-10-0-104-41 /]# grep SECCOMP /boot/config-$(uname -r)
CONFIG_HAVE_ARCH_SECCOMP=y
CONFIG_HAVE_ARCH_SECCOMP_FILTER=y
CONFIG_SECCOMP=y
CONFIG_SECCOMP_FILTER=y
Any other suggestions? Or should I raise a new issue? Thank you.
I am considering moving away from amazon-eks-ami as I am desperate to get this issue fixed. Can anybody suggest a good replacement? Maybe Bottlerocket?
@skupjoe If you're seeing this issue on the 5.4 kernel branch, it is definitely not the same issue described above. Please open a new issue or an AWS support case and we can take a look.
I see from the commits referenced to this issue that AWS team is still trying to wrangle/fix this bug, and so obviously this is still actively being worked on. But this issue being in Closed state falsely gives the impression that it has been fixed.
The commit references are a kernel change that was made by a community member to increase the default JIT space for BPF programs. It is not a fix for the issue described here, which was a memory leak. The memory leak was fixed by @q2ven (IIRC). We have no reason to think there's been a regression.
What happened:
After upgrading EKS nodes from
v20230203
tov20230217
on our1.24
EKS clusters after a few days a number of the nodes have containers stuck inContainerCreating
state or liveness/readiness probes reporting the following error:This issue is very similar to https://github.com/awslabs/amazon-eks-ami/issues/1179. However, we had not been seeing this issue on previous AMIs and it only started to occur on
v20230217
(following the upgrade from kernel 5.4 to 5.10) with no other changes to the underlying cluster or workloads.We tried the suggestions from that issue (
sysctl net.core.bpf_jit_limit=452534528
) which helped to immediately allow containers to be created and probes to execute but after approximately a day the issue returned and the value returned bycat /proc/vmallocinfo | grep bpf_jit | awk '{s+=$2} END {print s}'
was steadily increasing.What you expected to happen:
Ready
How to reproduce it (as minimally and precisely as possible):
I don't currently have a reproduction that I can share due to my current one using some internal code (I can hopefully produce a more generic one if required when I get a chance).
As a starting point we only noticed this happening on nodes that had pods scheduled on them which had an
exec
liveness & readiness probe running every 10 seconds that performs a health check against a gRPC service usinggrpcurl
. In addition to this we also have a default Pod Security Policy (yes we know they are deprecated 😄) that has the following annotationseccomp.security.alpha.kubernetes.io/defaultProfileName: docker/default
.These two conditions seem to be enough to trigger this issue and the values reported by
cat /proc/vmallocinfo | grep bpf_jit | awk '{s+=$2} END {print s}'
will steadily increase over time until containers can no longer be created on the node.Anything else we need to know?:
Environment:
aws eks describe-cluster --name <name> --query cluster.platformVersion
):"eks.4"
aws eks describe-cluster --name <name> --query cluster.version
):"1.24"
v20230217
uname -a
):5.10.165-143.735.amzn2.x86_64 #1 SMP Wed Jan 25 03:13:54 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
cat /etc/eks/release
on a node):Official Guidance
Kubernetes pods using SECCOMP filtering on EKS optimized AMIs based on Linux Kernel version 5.10.x may get stuck in
ContainerCreating
state or their liveness/readiness probes fail with the following error:When a process with SECCOMP filters creates a child process, the same filters are inherited and applied to the new process. The Amazon Linux kernel versions 5.10.x are affected by a memory leak that occurs when parent process is terminated while creating a child process. When the total amount of memory allocated for SECCOMP filter is over the limit, a process cannot create a new SECCOMP filter. As a result, the parent process fails to create a new child process and the above error message will be logged.
This issue is more likely to be encountered with kernel versions
kernel-5.10.176-157.645.amzn2
andkernel-5.10.177-158.645.amzn2
where the rate of the memory leak is higher.Amazon Linux will be releasing the fixed kernel by May 1st, 2023. We will be releasing a new set of EKS AMIs with the updated kernel latest by May 3rd, 2023.