awslabs / amazon-eks-ami

Packer configuration for building a custom EKS AMI
https://awslabs.github.io/amazon-eks-ami/
MIT No Attribution
2.44k stars 1.15k forks source link

File descriptor limit change in AMI release `v20231220` #1551

Closed mmerkes closed 10 months ago

mmerkes commented 10 months ago

What happened: Customers are reporting hitting ulimits as a result of this PR #1535 What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

johnkeates commented 10 months ago

We have hit this issue too, we have about ~1700 pods crashlooping in each cluster. I wonder if the CI doesn't test using a large enough workload?

mmerkes commented 10 months ago

We have already reverted the change that caused this issue (#1535), we're rolling back the v20231220 release and we're preparing to release new AMIs without the change ASAP. More guidance to come.

EDIT: We're not rolling back v20231220. We're focusing on rolling forward the next release with the change reverted.

maksim-paskal commented 10 months ago

It help us to restore our pods on new nodes, we using Karpenter

apiVersion: karpenter.k8s.aws/v1alpha1
kind: AWSNodeTemplate
...
spec:
 ....
  userData: |
    MIME-Version: 1.0
    Content-Type: multipart/mixed; boundary="BOUNDARY"

    --BOUNDARY
    Content-Type: text/x-shellscript; charset="us-ascii"

    #!/bin/bash

    rm -rf /etc/systemd/system/containerd.service.d/20-limitnofile.conf

    --BOUNDARY--

and drain all new nodes from cluster

jpedrobf commented 10 months ago

@mmerkes Can you please update us when the AMI is ready for usage?

adwittumuluri commented 10 months ago

☝️ adding to that, an ETA would much appreciated as well. Is it in the magnitude of hours or days?

atishpatel commented 10 months ago

I'm using this setup for now in karpenter userData. Bumping soft limit from 1024 to 102400

Adding this to our bootstrap for now to 10x the soft limit.

- /usr/bin/sed -i 's/^LimitNOFILE.*$/LimitNOFILE=102400:524288/' /etc/systemd/system/containerd.service.d/20-limitnofile.conf || true
pkoraca commented 10 months ago

If anyone needs, we fixed it in Karpenter by hardcoding the older AMI in AWSNodeTemplate CRD

spec:
  amiSelector:
    aws::ids: <OLD_AMI_ID>
cartermckinnon commented 10 months ago

A go runtime change in 1.19 automatically maxes-out the process’ NOFILE limit, so I would expect to see this problem with go binaries on earlier versions: https://github.com/golang/go/issues/46279

Has anyone run into this problem with a workload that isn’t a go program?

mmerkes commented 10 months ago

an ETA would much appreciated as well. Is it in the magnitude of hours or days?

We are working on releasing a new set of AMIs ASAP. I will post another update in 3-5 hours on the status. We should have a better idea then.

1lann commented 10 months ago

Has anyone run into this problem with a workload that isn’t a go program?

People have mentioned running into this problem on envoy proxy, which is a C++ program.

cartermckinnon commented 10 months ago

People have mentioned running into this problem on envoy proxy

Yes, I've been looking into that. Envoy doesn't seem to bump its own soft limit, and it also seems to crash hard when the limit is hit (on purpose): https://github.com/aws/aws-app-mesh-roadmap/issues/181

Other things I've noticed:

  1. The soft limit of 1024 is the default on ECS: https://docs.aws.amazon.com/AmazonECS/latest/APIReference/API_Ulimit.html
  2. Java's hotspot VM has bumped the limit by default for ~20 years; point being there's wide variety in how the nofile limit is handled: https://github.com/openjdk/jdk/blob/93fedc12db95d1e61c17537652cac3d4e27ddf2c/src/hotspot/os/linux/os_linux.cpp#L4575-L4589
suket22 commented 10 months ago

The EKS provided SSM Parameter to reference to the current EKS AMI has been reverted to reference the last good AMI in all regions globally. This will automatically resolve the issue for Karpenter and Managed node group users and any other systems that determine the latest EKS AMI from the SSM Parameter.

We will provide another update by December 29 at 5:00 PM with a deployment timeline for new AMIs.

polarathene commented 10 months ago

We have already reverted the change that caused this issue

It'd be ideal to identify what software is not compatible and actually getting that addressed, but I understand the need to revert for the time being.

So long as you avoid infinity, most software will have minimal regression:

If you need to set an explicit limit (presumably because defaults are not sufficient), and the advised 1024:524288 isn't enough due to software not requesting to raise it's limits... You could try matching the suggested hardlimit: LimitNOFILE=524288, or double that for the traditional hard limit (2^20).

That still won't be sufficient for some software as mentioned, but that is software that should know better and handle it's resource needs properly, exhausting the FD limit is per-process, so it's not necessarily an OOM event. The system-wide FD limit is much higher (based on memory IIRC).


People have mentioned running into this problem on envoy proxy, which is a C++ program.

Envoy requires a large number of FDs, they have expressed that they're not interested in raising the soft limit internally and that admins should instead set a high enough soft limit.

I've since opened a feature request to justify why Envoy should raise the soft limit rather than defer that to be externally set high where it can negatively impact other software.

References:


2. Java's hotspot VM has bumped the limit by default for ~20 years;

https://github.com/systemd/systemd/blob/1742aae2aa8cd33897250d6fcfbe10928e43eb2f/NEWS#L60..L94

Note that there are also reports that using very high hard limits (e.g. 1G) is problematic: some software allocates large arrays with one element for each potential file descriptor (Java, …) — a high hard limit thus triggers excessively large memory allocations in these applications.

For infinity, this could require 1,000 - 1,000,000 times as much memory (MySQL, not Java but an example of excessive memory allocation impact, coupled with usual increased CPU load), even though you may not need that much FDs, hence a poor default.

For Java, related to the systemd v240 release notes, there was this github comment at the time about Java's memory allocation. With the 524288 hard-limit that was 4MB, but infinity when resolving to 2^30 (many modern distros) would equate to 2,000x that (8GB).

While you cite 20 years, note that the hard-limit has incremented over time.


point being there's wide variety in how the nofile limit is handled

This was all (excluding Envoy) part of my original research into moving the LimitNOFILE=1024:524288 change forward. If you want a deep-dive resource on the topic for AWS, I have you covered! 😂

Systemd has it right AFAIK, sane soft and hard limits. For AWS deployments some may need a higher hard limit, but it's a worry when software like Envoy doesn't document anything about that requirement and advises the stance of raising the soft limit externally.

adjain131995 commented 10 months ago

We were using karpenter which again is an AWS backed tool and it started picking up the new AMI dynamically as we started facing issues. As a hot fix we have harcoded the previous AMI amiSelector: aws::name: amazon-eks*node-1.25-v20231201 However, looking forward to the AMI fix we can make it dynamic again

The root cause: https://github.com/awslabs/amazon-eks-ami/pull/1535

ndbaker1 commented 10 months ago

As an update to the previous announcement, we are tracking for a new release by January 4th.

Collin3 commented 10 months ago

As an update to the previous announcement, we are tracking for a new release by January 4th.

@ndbaker1 is this file descriptor limit change expected to be reintroduced to that release? Or will that still be excluded? Just wondering if we need to pin our AMI version until we implement our own fix for istio/envoy workloads or something is implemented in envoy itself to handle that change better

cartermckinnon commented 10 months ago

@Collin3 that change has been reverted and will not be in the next AMI release 👍

cartermckinnon commented 10 months ago

This is resolved in the latest release: https://github.com/awslabs/amazon-eks-ami/releases/tag/v20231230