Open hitsub2 opened 2 months ago
I have been trying to reproduce the issue but I haven't seen this behavior occur even once in my testing. Do you have karpenter controller logs from when this happened? Specifically looking for log with message initiating delete from interruption message
.
It is very easy to reproduce this issue. I just tested it with FIS, the metric is always + 2 for every single spot interruption.
I spent more time trying to reproduce this issue using FIS, just like you mentioned. Every time I only got one event. These are metrics from prometheus
And here's the grafana dashboard
Description
Observed Behavior: Currently Karpenter recieves the account all the spot interruption and filters in the karpenter controller logic. But we have setup a lambda to filter the interruption message and sends to the related sqs. So karpenter only receieves the spot interruption belongs to it.
When taking care of the spot interruption, Karpenter emits the metric karpenter_interruption_received_messages(message_type="SpotInterruptionKind") + 2 for every single spot interruption. For example, if there is one spot interruption, this metric value is 2.
Expected Behavior:
karpenter_interruption_received_messages(message_type="SpotInterruptionKind") = how many spot instance is interrupted
Reproduction Steps (Please include YAML):
Versions:
Chart Version: 0.37.0
Kubernetes Version (
kubectl version
): 1.29Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment