Open drawnwren opened 1 month ago
Why do you beleive this is an issue with the CSI Driver? Are you seeing that event relating to nodes that patched out the finalizers? From looking at the logs for looks to be a duplicate https://github.com/aws/karpenter-provider-aws/issues/7046
We recently had some issues with upgrading the kubernetes and ebs versions (#7200), so my assumption was that this is somehow related to that. This cluster has been relatively stable until upgrades for the last 9 months or so and now node provisioning is suddenly failing.
Can you provide any Karpenter logs?
Description
I'm not sure if this is #7046 or not, but our production cluster is unable to provision new nodes and
state node doesn't contain both a node and a nodeclaim
. We were previously having trouble with the ebs-csi-provisioner and had to force delete a node after having patched the finalizers off manuallykubectl patch node -p '{"metadata":{"finalizers":null}}' ip-----.us-east-2.compute.internal
. We don't have the associated "ERROR" message in our karpenter logs.Here's our nodepool:
We also have a
gpu
nodepool that appears to be working fine.Here are the controller logs: karpenter_logs.txt