Open vsoch opened 1 year ago
There's no direct way to do this now that I'm aware of, but there are two indirect ways to do it via userdata:
You can also build your own custom AMI that bakes this behavior in, though its probably simpler to do it via userdata.
Both of the methods will eliminate the need to watch for new nodes and reconfigure them as it will occur at instance boot.
Both of the methods will eliminate the need to watch for new nodes and reconfigure them as it will occur at instance boot.
That would definitely be an improvement (and I did see the launch template, but didn't mention). For the launch template case, since we are creating and destroying many different spot clusters, that would mean writing a custom one for each new managed node group. It's slightly better than what I'm doing now, but I would still push for exposing these more easily for the user, akin to how some other clouds have. As soon as the user has to go beyond the parameters of an API or command line flags, as they say in Harold and Kumar go to White Castle... "We've gone too far."! :laughing:
@tzneal can you show me an example of a launch template that updates the instances to use one thread per code (and i'm familiar with the snippets to do that)! I can figure out the logic to determine if the instance needs it, and I see that I can use it here: https://boto3.amazonaws.com/v1/documentation/api/1.26.85/reference/services/eks/client/create_nodegroup.html. I should be able to test it soon and give you feedback.
Also note that since we are requesting some listing of spot, it could be the fact that some (but not all) of the instances need the hot plug.
Sorry, I don't have anything handy. You should be able to create a launch template (https://boto3.amazonaws.com/v1/documentation/api/1.26.85/reference/services/ec2/client/create_launch_template.html), and then specify the user data (https://docs.aws.amazon.com/eks/latest/userguide/launch-templates.html#launch-template-user-data) without specifying an AMI ID in the launch template.
The userdata should be something like this (untested):
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="==MYBOUNDARY=="
--==MYBOUNDARY==
Content-Type: text/x-shellscript; charset="us-ascii"
#!/bin/bash
for cpunum in $(cat /sys/devices/system/cpu/cpu*/topology/thread_siblings_list | cut -s -d, -f2- | tr ',' '\n' | sort -un)
do
echo 0 > /sys/devices/system/cpu/cpu$cpunum/online
done
--==MYBOUNDARY==--
Community Note
Tell us about your request
For high performance computing (HPC) applications we tend to want one thread per core. I know about the great HPC series/family (we use them)! and control with creating single EC2 instances. The case I'd like a handle for that is managed node groups in EKS.
Which service(s) is this request for?
EKS
Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?
We are testing using managed node groups with spot instances in EKS, and our current "best" (meaning only) option is to use the logic here: https://aws.amazon.com/blogs/compute/disabling-intel-hyper-threading-technology-on-amazon-linux/ to update the nodes. Since these are spot, this means we need a separate thread or process to:
The above is a bit janky, because you can imagine that there is nothing stopping completing an operation, and then (between some duration of checking) getting a new node and having a job run without the proper threading! It's very error prone. (this is me running this logic--> 🤪) and I am hoping that we can we have:
both to automate the above and ensure that when a node comes up and it is deemed
Ready
it is also ready from a threading standpoint!Are you currently working around this issue?
See the previous answer! We have a thread running that is keeping a cache of known nodes, and when a new one is found, we pause running experiments and use the kubectl-node-shell plugin to issue one-off commands to update the threading. We then re-enable applications. Of course any applications running in the transition state are going to fail.
Additional context
I am very happy to test this out and give feedback. And it was really fun to learn how to "hotload" myself! So that's the silver lining I think. Thank you for that! :avocado: