Closed mmerkes closed 10 months ago
We have hit this issue too, we have about ~1700 pods crashlooping in each cluster. I wonder if the CI doesn't test using a large enough workload?
We have already reverted the change that caused this issue (#1535), we're rolling back the and we're preparing to release new AMIs without the change ASAP. More guidance to come.v20231220
release
EDIT: We're not rolling back v20231220
. We're focusing on rolling forward the next release with the change reverted.
It help us to restore our pods on new nodes, we using Karpenter
apiVersion: karpenter.k8s.aws/v1alpha1
kind: AWSNodeTemplate
...
spec:
....
userData: |
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="BOUNDARY"
--BOUNDARY
Content-Type: text/x-shellscript; charset="us-ascii"
#!/bin/bash
rm -rf /etc/systemd/system/containerd.service.d/20-limitnofile.conf
--BOUNDARY--
and drain all new nodes from cluster
@mmerkes Can you please update us when the AMI is ready for usage?
☝️ adding to that, an ETA would much appreciated as well. Is it in the magnitude of hours or days?
I'm using this setup for now in karpenter userData. Bumping soft limit from 1024 to 102400
Adding this to our bootstrap for now to 10x the soft limit.
- /usr/bin/sed -i 's/^LimitNOFILE.*$/LimitNOFILE=102400:524288/' /etc/systemd/system/containerd.service.d/20-limitnofile.conf || true
If anyone needs, we fixed it in Karpenter by hardcoding the older AMI in AWSNodeTemplate
CRD
spec:
amiSelector:
aws::ids: <OLD_AMI_ID>
A go runtime change in 1.19 automatically maxes-out the process’ NOFILE limit, so I would expect to see this problem with go binaries on earlier versions: https://github.com/golang/go/issues/46279
Has anyone run into this problem with a workload that isn’t a go program?
an ETA would much appreciated as well. Is it in the magnitude of hours or days?
We are working on releasing a new set of AMIs ASAP. I will post another update in 3-5 hours on the status. We should have a better idea then.
Has anyone run into this problem with a workload that isn’t a go program?
People have mentioned running into this problem on envoy proxy, which is a C++ program.
People have mentioned running into this problem on envoy proxy
Yes, I've been looking into that. Envoy doesn't seem to bump its own soft limit, and it also seems to crash hard when the limit is hit (on purpose): https://github.com/aws/aws-app-mesh-roadmap/issues/181
Other things I've noticed:
nofile
limit is handled: https://github.com/openjdk/jdk/blob/93fedc12db95d1e61c17537652cac3d4e27ddf2c/src/hotspot/os/linux/os_linux.cpp#L4575-L4589The EKS provided SSM Parameter to reference to the current EKS AMI has been reverted to reference the last good AMI in all regions globally. This will automatically resolve the issue for Karpenter and Managed node group users and any other systems that determine the latest EKS AMI from the SSM Parameter.
We will provide another update by December 29 at 5:00 PM with a deployment timeline for new AMIs.
We have already reverted the change that caused this issue
It'd be ideal to identify what software is not compatible and actually getting that addressed, but I understand the need to revert for the time being.
So long as you avoid infinity
, most software will have minimal regression:
2^10
vs 2^20
slows some affected tasks by roughly 1,000x, as opposed to 2^30
where the delta is substantial.select(2)
syscall it expects the soft limit to be 1024 to correctly function (additional select()
concerns documented here in a dedicated section).2^20
hard limit. This has been reported on their GH issue tracker already. infinity
would avoid that, but it would have been wiser for only Envoy to raise it's limit that high, than expect the environment to workaround Envoy needs, due to prior regression concern points.If you need to set an explicit limit (presumably because defaults are not sufficient), and the advised 1024:524288
isn't enough due to software not requesting to raise it's limits... You could try matching the suggested hardlimit: LimitNOFILE=524288
, or double that for the traditional hard limit (2^20
).
That still won't be sufficient for some software as mentioned, but that is software that should know better and handle it's resource needs properly, exhausting the FD limit is per-process, so it's not necessarily an OOM event. The system-wide FD limit is much higher (based on memory IIRC).
People have mentioned running into this problem on envoy proxy, which is a C++ program.
Envoy requires a large number of FDs, they have expressed that they're not interested in raising the soft limit internally and that admins should instead set a high enough soft limit.
I've since opened a feature request to justify why Envoy should raise the soft limit rather than defer that to be externally set high where it can negatively impact other software.
References:
2. Java's hotspot VM has bumped the limit by default for ~20 years;
https://github.com/systemd/systemd/blob/1742aae2aa8cd33897250d6fcfbe10928e43eb2f/NEWS#L60..L94
Note that there are also reports that using very high hard limits (e.g. 1G) is problematic: some software allocates large arrays with one element for each potential file descriptor (Java, …) — a high hard limit thus triggers excessively large memory allocations in these applications.
For infinity
, this could require 1,000 - 1,000,000 times as much memory (MySQL, not Java but an example of excessive memory allocation impact, coupled with usual increased CPU load), even though you may not need that much FDs, hence a poor default.
For Java, related to the systemd v240 release notes, there was this github comment at the time about Java's memory allocation. With the 524288
hard-limit that was 4MB, but infinity
when resolving to 2^30
(many modern distros) would equate to 2,000x that (8GB).
While you cite 20 years, note that the hard-limit has incremented over time.
2^30
.2^30
hard-limit increase (their actual motivation for this IIRC was actually due to a patched PAM issue that wasn't being resolved properly).2^30
hard-limit.point being there's wide variety in how the
nofile
limit is handled
This was all (excluding Envoy) part of my original research into moving the LimitNOFILE=1024:524288
change forward. If you want a deep-dive resource on the topic for AWS, I have you covered! 😂
Systemd has it right AFAIK, sane soft and hard limits. For AWS deployments some may need a higher hard limit, but it's a worry when software like Envoy doesn't document anything about that requirement and advises the stance of raising the soft limit externally.
We were using karpenter which again is an AWS backed tool and it started picking up the new AMI dynamically as we started facing issues. As a hot fix we have harcoded the previous AMI amiSelector: aws::name: amazon-eks*node-1.25-v20231201 However, looking forward to the AMI fix we can make it dynamic again
The root cause: https://github.com/awslabs/amazon-eks-ami/pull/1535
As an update to the previous announcement, we are tracking for a new release by January 4th.
As an update to the previous announcement, we are tracking for a new release by January 4th.
@ndbaker1 is this file descriptor limit change expected to be reintroduced to that release? Or will that still be excluded? Just wondering if we need to pin our AMI version until we implement our own fix for istio/envoy workloads or something is implemented in envoy itself to handle that change better
@Collin3 that change has been reverted and will not be in the next AMI release 👍
This is resolved in the latest release: https://github.com/awslabs/amazon-eks-ami/releases/tag/v20231230
What happened: Customers are reporting hitting ulimits as a result of this PR #1535 What you expected to happen:
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Environment:
aws eks describe-cluster --name <name> --query cluster.platformVersion
):aws eks describe-cluster --name <name> --query cluster.version
):uname -a
):cat /etc/eks/release
on a node):