Open qxmips opened 4 months ago
Do you happen to have more complete set of Karpenter controller logs from the time when this happened?
Do you happen to have more complete set of Karpenter controller logs from the time when this happened?
not much info, except for the messages that the node got deleted
22:18:57.334 disrupting via consolidation delete, terminating 1 nodes (1 pods) ip-10-11-130-134.ec2.internal/m5a.2xlarge/on-demand
...
03:49:10.814 deleted node
From the logs it seems like the pod that was orphaned is prod/hdr-service-app-c9cdb8dbf-w2hr2
. Just wanted to confirm that the pod spec that you have shared is same for this pod since the deployment is called test
in that.
From the logs it seems like the pod that was orphaned is
prod/hdr-service-app-c9cdb8dbf-w2hr2
. Just wanted to confirm that the pod spec that you have shared is same for this pod since the deployment is calledtest
in that.
yeah sorry. the attached log was from the original issue with a production service. but basically prod/hdr-service-app-c9cdb8dbf-w2hr2 had the same issue. here are the logs from the reproduces issue test:
E0622 04:08:15.200303 11 gc_controller.go:154] failed to get node ip-10-11-57-209.ec2.internal : node "ip-10-11-57-209.ec2.internal" not found
I0622 04:09:15.225784 11 gc_controller.go:246] "Found orphaned Pod assigned to the Node, deleting." pod="kube-system/aws-node-d47hp" node="ip-10-11-57-209.ec2.internal"
I0622 04:09:15.281401 11 gc_controller.go:246] "Found orphaned Pod assigned to the Node, deleting." pod="test/test-5ccdb7cd7f-dm9bq" node="ip-10-11-57-209.ec2.internal"
I0622 04:09:15.303497 11 gc_controller.go:246] "Found orphaned Pod assigned to the Node, deleting." pod="kube-system/ebs-csi-node-xt68r" node="ip-10-11-57-209.ec2.internal"
I0622 04:08:07.894293 10 node_tree.go:79] "Removed node in listed group from NodeTree" node="ip-10-11-57-209.ec2.internal" zone="us-east-1:\x00:us-east-1a"
--
terminationGracePeriodSeconds: 43200 #6hrs , 43200 - 12hrs
The deployment you shared has this. Is there a reason that this comment says 6 hours? Was terminationGracePeriod set to 6 hours or 12?
terminationGracePeriodSeconds: 43200 #6hrs , 43200 - 12hrs
The deployment you shared has this. Is there a reason that this comment says 6 hours? Was terminationGracePeriod set to 6 hours or 12?
terminationGracePeriodSeconds is set to 12hrs (43200 seconds). we need it as on rare occasions the pod can't be interupted for up to 12 hours.
I believe at this point it would make sense to go over the cluster audit logs to see if there's something that indicates what went wrong. Do you mind opening a support ticket to facilitate this?
I believe at this point it would make sense to go over the cluster audit logs to see if there's something that indicates what went wrong. Do you mind opening a support ticket to facilitate this?
sorry, what kind of support ticket do you mean? we don't have AWS premium support. I believe that behavior can be reproduced on any cluster.
I tried to reproduce this issue with the config that you have shared but couldn't reproduce it. Karpenter didn't remove the node until terminationGracerPeriod
was hit. At this point, I think it would make sense to have a look at more complete set of Karpenter controller logs from the time when this happened.
Description
Observed Behavior: We use a KEDA scaled deployment that can process messages for up to 12 hours. Therefore, the pod has terminationGracePeriodSeconds: 43200. To avoid node disruption by Karpenter, the pod has the cluster-autoscaler.kubernetes.io/safe-to-evict: "false" annotation.
When the pod was running, Karpenter couldn't remove it as expected (karpenter DisruptionBlocked: Cannot disrupt Node). However, when autoscale tried to remove the pod, it entered the Terminating state while continuing to work due to the termination grace period. (all good so far)
Karpenter then reported karpenter: FailedDraining: Failed to drain node, 5 pods are waiting to be evicted, but the node remained ready, and the pod continued running, waiting either to gracefully shut down after completing its job or for the terminationGracePeriodSeconds to elapse.
After approximately 6-7 hours, Karpenter forcefully removed the node, resulting in the pod becoming orphaned with the message "Found orphaned Pod assigned to the Node, deleting." Which was not expected.
Expected Behavior: Respect long terminationGracePeriodSeconds, I didn't find any documenter timeouts that could be configured to mitigate this behavior. Reproduction Steps (Please include YAML):
in cloud trail you will see that the node is killed by kaptenter after some time.
and in eks cloudwatch logs:
Versions:
Chart Version: 0.37.0
Kubernetes Version (
kubectl version
): 1.27Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment