Open ashwani2k opened 6 months ago
The Gardener project currently lacks enough active contributors to adequately respond to all issues. This bot triages issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/lifecycle rotten
/close
/lifecycle stale
/remove-lifecycle stale
How to categorize this issue?
/area disaster-recovery /area robustness /kind enhancement /priority 1
What would you like to be added: Currently in DWD failed lease calculation all the nodes are considered to arrive at the
nodeLeaseFailureFraction
. This can be misleading for cases like:Terminating
orFailed
phase esp. if they will take the entiremachineDrainTimeout
due to PDB or other issues with eviction of pods.Machines which are in crashLoopBackOff as they may never get created and including them in the count might also not be correct.Why is this needed: To avoid DWD to mistakenly initiate meltdown protection for clusters which are quick to hit the
nodeLeaseFailureFraction
if they are having prolonged occurrence of the above 2 phases for nodes/machines. Also observed in issue-live(4796)