clusterinthecloud / support

If you need help with Cluster in the Cloud, this is the right place
2 stars 0 forks source link

Add support for AWS Spot instances #31

Open milliams opened 3 years ago

milliams commented 3 years ago

We should support running spot instances on AWS.

Things to consider:

ksharlandjiev commented 3 years ago

Q: should it be per-instance or global per-cluster? A: EC2 Spot may request a specific resource back. That said, it doesn't mean it will request the entire EC2 fleet.

Q: how do we get notified of an impending termination? A: EC2 will issue what is called a Spot Instance interruptions event. This event can be detected by Amazon Event Bridge event (formally known as CloudWatch Events) or you can query your EC2 instance meta-data. If your Spot Instance is marked to be stopped or terminated by the Spot service, the instance-action item is present in your instance metadata. Otherwise, it is not present. You can retrieve instance-action as follows. To query your individual EC2 instance metadata, use the following curl which should return a JSON object if instance-action is present, or a HTTP status 404 if there is no event. TOKEN=curl -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600"\ && curl -H "X-aws-ec2-metadata-token: $TOKEN" –v http://169.254.169.254/latest/meta-data/spot/instance-action

For more information here: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-interruptions.html https://aws.amazon.com/blogs/compute/best-practices-for-handling-ec2-spot-instance-interruptions/

Q: How should we handle the termination: A: You can specify that Amazon EC2 should do one of the following when it interrupts a Spot Instance:

  1. Stop the Spot Instance
  2. Hibernate the Spot Instance
  3. Terminate the Spot Instance

More information here: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-interruptions.html