Open manojtr opened 3 weeks ago
There's nothing special about drift compared to other forms of voluntary disruption (e.g. consolidation, expiration, emptiness). Pod disruption budgets and Karpenter's do-not-disrupt
annotation are respected by all forms of voluntary disruption and replacement nodes respect the pods' scheduling constraints (e.g. topology spread constraints). The Karpenter disruption docs goes into depth on the exact flow: https://karpenter.sh/docs/concepts/disruption. Is there anything you were wondering that isn't covered there?
This issue has been inactive for 14 days. StaleBot will close this stale issue after 14 more days of inactivity.
Description
How can the docs be improved?
Please add the below in FAQ or some information about the node selection criteria for replacement?
When node upgrades are triggered by the drift detection, what is the criteria selecting nodes for replacement? based on availability zone or random?
The context for the question is, how to make sure high availability of the application during the upgrade? For example, if node upgrade is happening in parallel(I assume), there maybe a chance that same application nodes that run in 3 AZ were being cordoned and replaced at the same time and cause application outage.