Closed drawnwren closed 2 months ago
Just to clarify what Karpenter's pre-spin does, we will launch a node and wait for it to become ready before beginning to drain the disrupted node. However, when Karpenter drains a node, it will not launch a new pod on the new node and wait for it to become ready before evicting the previous pod. The pre-spin exists to make this down-time as short as possible, but there will still be downtime. Karpenter does respect PDBs which should be configured to ensure high availability. We might be on the same page already, but I wanted to clarify since you mentioned you can see this from watching your pods where I would expect temporary downtime.
If the replacement node isn't initialized before we begin terminating the replacement, could you share your logs and Karpenter version?
We didn't have PDBs implemented on our pods. I've added it now, but we did have HPAs w/ min: 1. Would lack of pdb explain our issue?
And yes, the behavior we were seeing was that a new pod would not be scheduled until the old pod had completely terminated.
Yep, that's expected behavior. Karpenter doesn't actually create any new pods when it terminates a node, it evicts all pods running on the node using the Eviction API. Once those pods are evicted, whatever is responsible for managing their lifetime may create new pods in response (e.g. the replicaset controller). The eviction API does respect PDBs though, so creating a PDB with minAvailable: 1
may meet your requirements.
This issue has been inactive for 14 days. StaleBot will close this stale issue after 14 more days of inactivity.
Description
Observed Behavior: We have a nodepool of gpu nodes that has only one pod and expireAfter: 3h (to try and get put back onto spot if we are on an on-demand node). We're seeing karpenter take the node down every 3 hours and then schedule a new pod afterwards. So the first node fully terminates and then the new node is started. We're also having trouble reliably documenting this behavior. We can see it when we're watching the pods w/
watch -n 5 kubectl get all
. Is there an easier way to double check what we're seeing? Expected Behavior: We expect karpenter to "Pre-spin any replacement nodes needed as calculated in Step (2), and wait for them to become ready." every 3 hours. Reproduction Steps (Please include YAML):Versions:
kubectl version
):