Closed yhvh closed 5 years ago
I think this should part of node group management epic, along the side with #100. I would rather avoid creating an ad-hoc flag(s) in eksctl create cluster
before we have eksctl create nodegroup
.
I'll second this request: We're using k8 with polyaxon for ML model experimentation, development, and management. We typically launch a large number of nodes for short periods of time (e.g. for hyperparameter tuning jobs, large training jobs). These nodes are often relatively expensive (high-mem, high-core count, and/or GPUs) but not mission-critical. These workloads are ideal for spot instance usage.
+1 EC2Spot support would be amazing
@errordeveloper can we leverage Launch Templates to provide options like spot instances? That could cover a lot of other requests I see in issues. E.g. I would like to be able to create node pools of nodes that have extra volumes attached. I could provide eksctl
with a Launch Template.
Anything specified in the eksctl
config or command line options would override the template, but otherwise everything from extra tags, extra volumes, tenancy, spot instance choices, T2/T3-unlimited option, placement groups etc. could come from the Launch Template, rather than duplicating all these options within eksctl
?
What is the status on this? We are currently having to resort to manually updating our node groups to support spot fleets using the following tutorial, and it is less than optimal to say the least...
Josh, what would be simplest thing that would help to begin with?
On Tue, 26 Mar 2019, 7:39 pm Josh Delsman, notifications@github.com wrote:
What is the status on this? We are currently having to resort to manually updating our node groups to support spot fleets using the following tutorial, and it is less than optimal to say the least...
https://eksworkshop.com/spotworkers/workers/
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/weaveworks/eksctl/issues/198#issuecomment-476815704, or mute the thread https://github.com/notifications/unsubscribe-auth/AAPWS_ahZ9MhX4m8DRJYkdnC_aNtuha1ks5vandxgaJpZM4WhD2o .
@errordeveloper my team and I found that it would be simplest to add the SpotPrice
option to the AWS::EC2::LaunchConfiguration
that is generated for node groups. I think a flag on node groups for the maximum spot price would suffice (in fact, I think that is what kops does?)
Longer term: it would be best to use a launch template which supports spot fleets so we can get around service limits/capacity issues for certain instance types, etc.
@voxxit Just adding the spot price to the launch configuration doesn't work for me. Are there any other changes to be made in the launch configuration? Steps:
I made no other changes, not to the nodegroup name or any other information, not to the userdata(which seems encrypted). From what I observe is that the nodegroup that i initially created had an ondemand instance and its present, and no new spot instance is spawned.
Any ideas what more changes to make to the template in order for spot instances to work? Thanks
@romil-punetha
I've successed to create spot instance worker nodegroup.
Here is sample code:
$ CLUSTER_NAME=cluster
$ NODEGROUP_NAME=nodegroup
$ eksctl create ng --cluster "${CLUSTER_NAME}" -n "${NODEGROUP_NAME}" -N 1
$ TEMPLATE_BODY=$(aws cloudformation get-template --stack-name "eksctl-${CLUSTER_NAME}-nodegroup-${NODEGROUP_NAME}" --query 'TemplateBody' | tr -d '\n' | python -c '
import json
d = json.loads(input())
d["Resources"]["NodeLaunchConfig"]["Properties"]["InstanceType"] = "t3.small"
d["Resources"]["NodeLaunchConfig"]["Properties"]["SpotPrice"] = "0.008"
d["Resources"]["NodeGroup"]["Properties"]["MinSize"] = "0"
d["Resources"]["NodeGroup"]["UpdatePolicy"]["AutoScalingRollingUpdate"]["MinInstancesInService"] = "0"
print(json.dumps(d))')
$ aws cloudformation update-stack --stack-name "eksctl-${CLUSTER_NAME}-nodegroup-${NODEGROUP_NAME}" --template-body "${TEMPLATE_BODY}" --capabilities "CAPABILITY_IAM"
$ aws cloudformation wait stack-update-complete --stack-name "eksctl-${CLUSTER_NAME}-nodegroup-${NODEGROUP_NAME}"
I moved to using spotinst.
@romil-punetha Are you using eksctl with spotinst? If so, how do you create a nodegroup using eksctl?
@mhumeSF
apiVersion: eksctl.io/v1alpha4
kind: ClusterConfig
metadata:
name: eks-test
region: ap-south-1
version: "1.12"
tags:
version: test
billing: core
sub-billing : kubernetes
environment: test
vpc:
id: "vpc-xxxxxxxx" # Production VPC
cidr: "192.168.0.0/16"
subnets:
private:
ap-south-1a:
id: "subnet-xxxxxx"
cidr: "192.168.x.x/24"
ap-south-1b:
id: "subnet-xxxxxx"
cidr: "192.168.x.x/24"
public:
ap-south-1a:
id: "subnet-xxxxxxx"
cidr: "192.168.x.x/24"
ap-south-1b:
id: "subnet-xxxxxxx"
cidr: "192.168.x.x/24"
iam:
serviceRoleARN: "arn:aws:iam::xxxxxxxxxx:role/EKS"
nodeGroups:
- name: eks-spotinst
labels:
applicationType: org
machineSize: mixed
instanceType: t2.xlarge
desiredCapacity: 1
ami: ami-09c3eb35bb3be46a4
minSize: 1
maxSize: 1
volumeSize: 50
volumeType: gp2
privateNetworking: true
maxPodsPerNode: 50
allowSSH: true
sshPublicKey: 'eks-bastion'
availabilityZones: ["ap-south-1a", "ap-south-1b"]
iam:
withAddonPolicies:
imageBuilder: true
autoScaler: true
@errordeveloper I noticed that your merged commit affected this, are spot instances available now for eksctl?
@kyprifog not yet, we just switched to launch configuration, which should make it easier to implement.
Hi @yhvh @voxxit , I am looking into adding support for spot instances. I have a few questions about how this would work:
If all instances are terminated, for example because the price went over the max price, what would you expect to happen? I think the nodegroup would still exist but would have 0 capacity. Is that what you would want?
Do you often change the max price? Today, nodegroups are immutable, so it wouldn't be possible to change the max price. Is changing price something we should support in your opinion?
For what it's worth:
I generally expect that if the spot price exceeds my max price (which I set to equal the on demand price), the group will exist, but will have 0 capacity. I generally pair every spot node group with an otherwise identical on demand group, both with 0 min capacity.
I generally do not alter the max price. If I happen to notice that the on Demand and average spot pricing have drifted by over 20% for over a week, I will replace the spot node group with one with a new max price but this is a chore I frequently forget to do.
BTW, I have also found it important to set a non-zero grace period on the AutoScaling group:
In kops clusters, I use k8s-spot-termination-handler, k8s-spot-rescheduler, and k8s-spot-price-monitor.
eksctl contributor @mumoshu developed the kube-spot-termination-notice-handler and would probably have useful experience and advice on this topic.
This is very useful. Thanks Steve!
If we were to support only mixed instance groups - where a nodegroup can have both on demand and spot instances but with the requirement of needing at least 2 instance types- would that be useful for your use cases? cc @yhvh
That would meet my needs! We fell into so single node type asgs when that was all that was available.
@martina-if What can I do to help test this and get it ready to ship? :shipit: 😄
Hi @voxxit ! I am waiting for Ilya to come back and review my PR (hopefully tomorrow). In the meantime, If you would like to test it you can build eksctl from my branch martina-if:mixed-instances
. That would be very helpful actually :)
Why do you want this feature? We are using kubernetes for data science. Since this load is non-critical we would like to use spot instances for worker nodes.
What feature/behavior/change do you want? An option to deploy spot instances instead of on-demand instances