Please vote on this issue by adding a đź‘Ť reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment
Tell us about your request
I'd like to be able to logically group multiple MNGs together to be treated as a single unit when it comes to updates; this would match the guidance to create a MNG per AZ when running stateful workloads. Alternatively if MNGs could create an ASG per AZ behind the scenes and manage them as one this would also work. For a managed service I shouldn't have to think about this.
Which service(s) is this request for?
EKS
Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?
When I update my MNGs an instance per group is terminated which can causes a race condition where nodes can't terminate due to PDBs and pods being spread across multiple nodes. This is especially noticeable when running a node per AZ with stateful workloads. When this occurs not only does it disrupt the services but it locks some of them up due to the PVs becoming orphaned.
Are you currently working around this issue?
I can't use MNGs while this behaviour isn't supported.
Actually couldn't/shouldn't MNGs support spanning multiple AZs while still supporting workloads requiring AZ based persistence? This seems like table stakes for a managed solution.
Community Note
Tell us about your request I'd like to be able to logically group multiple MNGs together to be treated as a single unit when it comes to updates; this would match the guidance to create a MNG per AZ when running stateful workloads. Alternatively if MNGs could create an ASG per AZ behind the scenes and manage them as one this would also work. For a managed service I shouldn't have to think about this.
Which service(s) is this request for? EKS
Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard? When I update my MNGs an instance per group is terminated which can causes a race condition where nodes can't terminate due to PDBs and pods being spread across multiple nodes. This is especially noticeable when running a node per AZ with stateful workloads. When this occurs not only does it disrupt the services but it locks some of them up due to the PVs becoming orphaned.
Are you currently working around this issue? I can't use MNGs while this behaviour isn't supported.
Additional context See #1866
Attachments n/a