aws / containers-roadmap

This is the public roadmap for AWS container services (ECS, ECR, Fargate, and EKS).
https://aws.amazon.com/about-aws/whats-new/containers/
Other
5.21k stars 320 forks source link

[EKS] [request]: Improve managed node group update behaviour documentation #1678

Open stevehipwell opened 2 years ago

stevehipwell commented 2 years ago

Community Note

Tell us about your request I'd like to see the Managed node update behavior documentation improved. The current documentation doesn't seem correct if the max unavailable isn't 100% as there is no documented connection between the partially completed upgrade phase back to the scale up phase. It would also be useful to describe some actual considerations as to why you'd choose one pattern over another. This would also be relevant in the EKS Best Practices Guides which currently promotes MNGs without any actual content describing them or why you'd actually want to use them (or not).

Which service(s) is this request for? EKS

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard? I'd like to be able to have a high availability set of MNGs that can be updated without having un-schedulable pods. This is specifically relevant where you're using CA, PVs and so have a MNG per AZ.

Are you currently working around this issue? Setting maxUnavailable to 1 and hoping it works.

Additional context n/a

Attachments n/a

drmaciej commented 1 year ago

It would also be good to describe if and how the update process verifies that pods have been allocated to new nodes. Does it waits for pods to be scheduled onto the node (so not Pending)? Does it wait for them to be Running & healthy?

AWS Support thinks the process waits for pods to be ready, but we cannot find any hard proof for it. There seem to be no relevant actions in k8s audit logs when the update process is running. Would be good to get confirmation of that.