aws / aws-node-termination-handler

Gracefully handle EC2 instance shutdown within Kubernetes
https://aws.amazon.com/ec2
Apache License 2.0
1.63k stars 268 forks source link

NTH log shows much less SPOT_ITN events than cloudtrail BidEvictedEvent #701

Closed wshi5985 closed 2 years ago

wshi5985 commented 2 years ago

Describe the bug NTH log shows much less SPOT_ITN events than aws cloudtrail BidEvictedEvent

Steps to reproduce NTH log shows only 1/6 ~ 1/5 of the spot interruption events compare to cloudtrail BidEvictedEvent. it has always been like this. We created AWS support ticket to trace a recent massive spot instances termination incident. They confirmed all 470+ BidEvictedEvent were caused by spot interruption, but NTH log only showed less than 100 SPOT_ITN events. Is this more like a log related issue? or, some spot interruption happened without signal? or, NTH did not catch all spot interruption signals ? thanks.

Expected outcome SPOT_ITN event counts match cloudtrail BidEvictedEvents

Application Logs

2022/10/07 04:14:38 INF Adding new event to the event store event={"AutoScalingGroupName":"","Description":"Spot ITN received. Instance will be interrupted at 2022-10-07T04:16:37Z \n","EndTime":"0001-01-01T00:00:00Z","EventID":"spot-itn-c375f2bb4fa7984f3455aef01a1717bf6a7d7689850d31a68f324ba46ba7ae52","InProgress":false,"InstanceID":"","IsManaged":false,"Kind":"SPOT_ITN","NodeLabels":null,"NodeName":"ip-10-128-37-128.ec2.internal","NodeProcessed":false,"Pods":null,"ProviderID":"","StartTime":"2022-10-07T04:16:37Z","State":""} Environment

github-actions[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If you want this issue to never become stale, please ask a maintainer to apply the "stalebot-ignore" label.

github-actions[bot] commented 2 years ago

This issue was closed because it has become stale with no activity.