Open shreyas-badiger opened 1 month ago
Does a metric work for you here? Or would you like a constant update of rolled-up information directly in the status? This is definitely something we are thinking about as we are thinking about how we can improve our observability of Karpenter for v1.
Does this request belong in the kubernetes-sigs/karpenter
repo since it's about the netural concept of drift?
It would be advantageous to have this data accessible in both metrics and CR status. Adding it to the CR status would greatly benefit other watchers in the cluster, We should also consider adding the number of Nodes that are restricted because of PDB violations.
It would be advantageous to have this data accessible in both metrics and CR status
We do have to be a little thoughtful about the number of updates that this would generate. I'm not saying that it's out of the question, but a metric is a bit easier to swallow only because they're pull-based and not push-based.
there is no clear way to identify how many nodes have drifted from the current hash
You could take a look at the NodeClaim status conditions to see if a NodeClaim has a "Drifted" condition. Counting these up across the cluster (or by label) should give you the info you want. Yeah, you have to construct it, but given that this doesn't currently exist in Karpenter, this is a possible workaround.
Description
What problem are you trying to solve? Currently, there is no clear way to identify how many nodes have drifted from the current hash (nodepool hash and ec2nodeclass hash). To determine the node rotation progress, we will have to look into individual node objects, nodeclaims, nodepool and nodeclass.
Since Karpenter controller identifies and rotates the drifted nodes, I am assuming the controller already maintains the list of drifted nodes (if not, identifies the drifted nodes in every reconciliation.) It will be helpful to surface this information in the NodePool status.
for ex:
How important is this feature to you? This feature will be very useful to identify the progress of node rotation whenever we change AMIs or trigger any other form of upgrades by updating the nodepool or ec2nodeclass.