[EKS] [request]: Cloudwatch Observability addon's default tolerations are too broad

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

Tell us about your request The default tolerations for the EKS Cloudwatch Observability addon are too broad:

  tolerations:
  - operator: Exists

I can understand these tolerations for the daemonset pods for cloudwatch agents and fluent-bit, but this toleration also applies to the controller-manager pod, which is managed by a deployment with replica count 1.

Which service(s) is this request for? EKS

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard? The default tolerations mean that while draining a node, the scheduler can sometimes schedule the pod on the exact node it's draining from.

Are you currently working around this issue? Currently working around by overriding the annotations, or by just waiting long enough the node is hard terminated.

Additional context Anything else we should know?

Attachments If you think you might have additional information that you'd like to include via an attachment, please do - we'll take a look. (Remember to remove any personally-identifiable information.)

aws / containers-roadmap

[EKS] [request]: Cloudwatch Observability addon's default tolerations are too broad #2462

Community Note