Criteria for selecting nodes for replacement triggered by the drift detection

manojtr commented 3 weeks ago

Description

How can the docs be improved?

Please add the below in FAQ or some information about the node selection criteria for replacement?

When node upgrades are triggered by the drift detection, what is the criteria selecting nodes for replacement? based on availability zone or random?

The context for the question is, how to make sure high availability of the application during the upgrade? For example, if node upgrade is happening in parallel(I assume), there maybe a chance that same application nodes that run in 3 AZ were being cordoned and replaced at the same time and cause application outage.

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

jmdeal commented 3 weeks ago

There's nothing special about drift compared to other forms of voluntary disruption (e.g. consolidation, expiration, emptiness). Pod disruption budgets and Karpenter's do-not-disrupt annotation are respected by all forms of voluntary disruption and replacement nodes respect the pods' scheduling constraints (e.g. topology spread constraints). The Karpenter disruption docs goes into depth on the exact flow: https://karpenter.sh/docs/concepts/disruption. Is there anything you were wondering that isn't covered there?

github-actions[bot] commented 6 days ago

This issue has been inactive for 14 days. StaleBot will close this stale issue after 14 more days of inactivity.

aws / karpenter-provider-aws

Criteria for selecting nodes for replacement triggered by the drift detection #6351

Description