aws / karpenter-provider-aws

Karpenter is a Kubernetes Node Autoscaler built for flexibility, performance, and simplicity.
https://karpenter.sh
Apache License 2.0
6.62k stars 923 forks source link

Support cordon-only mode on spot interruption for cost efficiency #7024

Open int128 opened 2 days ago

int128 commented 2 days ago

Description

What problem are you trying to solve?

We run the GitHub Actions self-hosted runners on spot instances. When a spot interruption occurs, Karpenter evicts a runner Pod to a new Node. It wastes our EC2 cost, because the runner Pod is not re-runnable.

That is,

  1. The controller (actions-runner-controller) creates a runner Pod.
  2. A spot interruption is occurred in AWS.
  3. Karpenter evicts the runner Pod to a new Node. This may launch an EC2 instance. 💰
  4. A new runner Pod is started but finally exited with an error that is not re-runnable.

It would be nice if a NodePool supports cordon-only mode instead of eviction. I found the related issue https://github.com/aws/karpenter-provider-aws/issues/3604.

How important is this feature to you?

This feature reduces our EC2 cost, because no new instance is launched upon a spot interruption.


njtran commented 1 day ago

I'm not entirely against this, but we'd need an RFC to know how this might be implemented/configured. Do you have any thoughts on how you'd best want to do that? Are you willing to write an RFC for this?