eksctl-io / eksctl

The official CLI for Amazon EKS
https://eksctl.io
Other
4.94k stars 1.41k forks source link

Provide an option to use spot instances #198

Closed yhvh closed 5 years ago

yhvh commented 6 years ago

Why do you want this feature? We are using kubernetes for data science. Since this load is non-critical we would like to use spot instances for worker nodes.

What feature/behavior/change do you want? An option to deploy spot instances instead of on-demand instances

errordeveloper commented 6 years ago

I think this should part of node group management epic, along the side with #100. I would rather avoid creating an ad-hoc flag(s) in eksctl create cluster before we have eksctl create nodegroup.

danielchalef commented 5 years ago

I'll second this request: We're using k8 with polyaxon for ML model experimentation, development, and management. We typically launch a large number of nodes for short periods of time (e.g. for hyperparameter tuning jobs, large training jobs). These nodes are often relatively expensive (high-mem, high-core count, and/or GPUs) but not mission-critical. These workloads are ideal for spot instance usage.

alando46 commented 5 years ago

+1 EC2Spot support would be amazing

whereisaaron commented 5 years ago

@errordeveloper can we leverage Launch Templates to provide options like spot instances? That could cover a lot of other requests I see in issues. E.g. I would like to be able to create node pools of nodes that have extra volumes attached. I could provide eksctl with a Launch Template.

Anything specified in the eksctl config or command line options would override the template, but otherwise everything from extra tags, extra volumes, tenancy, spot instance choices, T2/T3-unlimited option, placement groups etc. could come from the Launch Template, rather than duplicating all these options within eksctl?

voxxit commented 5 years ago

What is the status on this? We are currently having to resort to manually updating our node groups to support spot fleets using the following tutorial, and it is less than optimal to say the least...

https://eksworkshop.com/spotworkers/workers/

errordeveloper commented 5 years ago

Josh, what would be simplest thing that would help to begin with?

On Tue, 26 Mar 2019, 7:39 pm Josh Delsman, notifications@github.com wrote:

What is the status on this? We are currently having to resort to manually updating our node groups to support spot fleets using the following tutorial, and it is less than optimal to say the least...

https://eksworkshop.com/spotworkers/workers/

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/weaveworks/eksctl/issues/198#issuecomment-476815704, or mute the thread https://github.com/notifications/unsubscribe-auth/AAPWS_ahZ9MhX4m8DRJYkdnC_aNtuha1ks5vandxgaJpZM4WhD2o .

voxxit commented 5 years ago

@errordeveloper my team and I found that it would be simplest to add the SpotPrice option to the AWS::EC2::LaunchConfiguration that is generated for node groups. I think a flag on node groups for the maximum spot price would suffice (in fact, I think that is what kops does?)

Longer term: it would be best to use a launch template which supports spot fleets so we can get around service limits/capacity issues for certain instance types, etc.

romilpunetha commented 5 years ago

@voxxit Just adding the spot price to the launch configuration doesn't work for me. Are there any other changes to be made in the launch configuration? Steps:

  1. Create a nodegroup via eksctl cli
  2. Open cloudformation and download the template of the created nodegroup
  3. Edit the template, set nodeAutoscalingMinSize = 0 and SpotPrice=0.x.
  4. Update the stack.

I made no other changes, not to the nodegroup name or any other information, not to the userdata(which seems encrypted). From what I observe is that the nodegroup that i initially created had an ondemand instance and its present, and no new spot instance is spawned.

Any ideas what more changes to make to the template in order for spot instances to work? Thanks

k-tahiro commented 5 years ago

@romil-punetha

I've successed to create spot instance worker nodegroup.

Here is sample code:

$ CLUSTER_NAME=cluster
$ NODEGROUP_NAME=nodegroup
$ eksctl create ng --cluster "${CLUSTER_NAME}" -n "${NODEGROUP_NAME}" -N 1
$ TEMPLATE_BODY=$(aws cloudformation get-template --stack-name "eksctl-${CLUSTER_NAME}-nodegroup-${NODEGROUP_NAME}" --query 'TemplateBody' | tr -d '\n' | python -c '
import json
d = json.loads(input())
d["Resources"]["NodeLaunchConfig"]["Properties"]["InstanceType"] = "t3.small"
d["Resources"]["NodeLaunchConfig"]["Properties"]["SpotPrice"] = "0.008"
d["Resources"]["NodeGroup"]["Properties"]["MinSize"] = "0"
d["Resources"]["NodeGroup"]["UpdatePolicy"]["AutoScalingRollingUpdate"]["MinInstancesInService"] = "0"
print(json.dumps(d))')
$ aws cloudformation update-stack --stack-name "eksctl-${CLUSTER_NAME}-nodegroup-${NODEGROUP_NAME}" --template-body "${TEMPLATE_BODY}" --capabilities "CAPABILITY_IAM"
$ aws cloudformation wait stack-update-complete --stack-name "eksctl-${CLUSTER_NAME}-nodegroup-${NODEGROUP_NAME}"
romilpunetha commented 5 years ago

I moved to using spotinst.

mhumeSF commented 5 years ago

@romil-punetha Are you using eksctl with spotinst? If so, how do you create a nodegroup using eksctl?

romilpunetha commented 5 years ago

@mhumeSF

apiVersion: eksctl.io/v1alpha4
kind: ClusterConfig
metadata:
  name: eks-test
  region: ap-south-1
  version: "1.12"
  tags:
    version: test
    billing: core
    sub-billing : kubernetes
    environment: test
vpc:
  id: "vpc-xxxxxxxx" # Production VPC
  cidr: "192.168.0.0/16"
  subnets:
    private:
      ap-south-1a:
          id: "subnet-xxxxxx"
          cidr: "192.168.x.x/24"
      ap-south-1b:
          id: "subnet-xxxxxx"
          cidr: "192.168.x.x/24"
    public:
      ap-south-1a:
          id: "subnet-xxxxxxx"
          cidr: "192.168.x.x/24"
      ap-south-1b:
          id: "subnet-xxxxxxx"
          cidr: "192.168.x.x/24"
iam: 
  serviceRoleARN: "arn:aws:iam::xxxxxxxxxx:role/EKS"
nodeGroups:
  - name: eks-spotinst
    labels:
      applicationType: org
      machineSize: mixed
    instanceType: t2.xlarge
    desiredCapacity: 1
    ami: ami-09c3eb35bb3be46a4
    minSize: 1
    maxSize: 1
    volumeSize: 50
    volumeType: gp2
    privateNetworking: true
    maxPodsPerNode: 50
    allowSSH: true
    sshPublicKey: 'eks-bastion'
    availabilityZones: ["ap-south-1a", "ap-south-1b"]
    iam:
      withAddonPolicies:
        imageBuilder: true
        autoScaler: true
kyprifog commented 5 years ago

@errordeveloper I noticed that your merged commit affected this, are spot instances available now for eksctl?

errordeveloper commented 5 years ago

@kyprifog not yet, we just switched to launch configuration, which should make it easier to implement.

martina-if commented 5 years ago

Hi @yhvh @voxxit , I am looking into adding support for spot instances. I have a few questions about how this would work:

StevenACoffman commented 5 years ago

For what it's worth:

BTW, I have also found it important to set a non-zero grace period on the AutoScaling group:

In kops clusters, I use k8s-spot-termination-handler, k8s-spot-rescheduler, and k8s-spot-price-monitor.

eksctl contributor @mumoshu developed the kube-spot-termination-notice-handler and would probably have useful experience and advice on this topic.

martina-if commented 5 years ago

This is very useful. Thanks Steve!

martina-if commented 5 years ago

If we were to support only mixed instance groups - where a nodegroup can have both on demand and spot instances but with the requirement of needing at least 2 instance types- would that be useful for your use cases? cc @yhvh

StevenACoffman commented 5 years ago

That would meet my needs! We fell into so single node type asgs when that was all that was available.

voxxit commented 5 years ago

@martina-if What can I do to help test this and get it ready to ship? :shipit: 😄

martina-if commented 5 years ago

Hi @voxxit ! I am waiting for Ilya to come back and review my PR (hopefully tomorrow). In the meantime, If you would like to test it you can build eksctl from my branch martina-if:mixed-instances. That would be very helpful actually :)