aws / containers-roadmap

This is the public roadmap for AWS container services (ECS, ECR, Fargate, and EKS).
https://aws.amazon.com/about-aws/whats-new/containers/
Other
5.22k stars 320 forks source link

[EKS] [request]: command line and API programmatic parameters to control threads per core #2225

Open vsoch opened 1 year ago

vsoch commented 1 year ago

Community Note

Tell us about your request

For high performance computing (HPC) applications we tend to want one thread per core. I know about the great HPC series/family (we use them)! and control with creating single EC2 instances. The case I'd like a handle for that is managed node groups in EKS.

Which service(s) is this request for?

EKS

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?

We are testing using managed node groups with spot instances in EKS, and our current "best" (meaning only) option is to use the logic here: https://aws.amazon.com/blogs/compute/disabling-intel-hyper-threading-technology-on-amazon-linux/ to update the nodes. Since these are spot, this means we need a separate thread or process to:

  1. watch for new nodes
  2. determine when a new node is added, pause running experiments
  3. connect to the node (I am using a kubectl plugin but the other strategy, more complex, is to create a bastion for ssh to the private subset)
  4. run the "runtime" example from the link above
  5. flag that the group is again ready

The above is a bit janky, because you can imagine that there is nothing stopping completing an operation, and then (between some duration of checking) getting a new node and having a job run without the proper threading! It's very error prone. (this is me running this logic--> 🤪) and I am hoping that we can we have:

  1. First priority: API programmatic ability to specify the threads (ideally through functions to create nodegroup and similar)
  2. Second priority, the same, but from the command line (we use command line tools less)

both to automate the above and ensure that when a node comes up and it is deemed Ready it is also ready from a threading standpoint!

Are you currently working around this issue?

See the previous answer! We have a thread running that is keeping a cache of known nodes, and when a new one is found, we pause running experiments and use the kubectl-node-shell plugin to issue one-off commands to update the threading. We then re-enable applications. Of course any applications running in the transition state are going to fail.

Additional context

I am very happy to test this out and give feedback. And it was really fun to learn how to "hotload" myself! So that's the silver lining I think. Thank you for that! :avocado:

image

tzneal commented 1 year ago

There's no direct way to do this now that I'm aware of, but there are two indirect ways to do it via userdata:

You can also build your own custom AMI that bakes this behavior in, though its probably simpler to do it via userdata.

Both of the methods will eliminate the need to watch for new nodes and reconfigure them as it will occur at instance boot.

vsoch commented 1 year ago

Both of the methods will eliminate the need to watch for new nodes and reconfigure them as it will occur at instance boot.

That would definitely be an improvement (and I did see the launch template, but didn't mention). For the launch template case, since we are creating and destroying many different spot clusters, that would mean writing a custom one for each new managed node group. It's slightly better than what I'm doing now, but I would still push for exposing these more easily for the user, akin to how some other clouds have. As soon as the user has to go beyond the parameters of an API or command line flags, as they say in Harold and Kumar go to White Castle... "We've gone too far."! :laughing:

vsoch commented 1 year ago

@tzneal can you show me an example of a launch template that updates the instances to use one thread per code (and i'm familiar with the snippets to do that)! I can figure out the logic to determine if the instance needs it, and I see that I can use it here: https://boto3.amazonaws.com/v1/documentation/api/1.26.85/reference/services/eks/client/create_nodegroup.html. I should be able to test it soon and give you feedback.

Also note that since we are requesting some listing of spot, it could be the fact that some (but not all) of the instances need the hot plug.

tzneal commented 1 year ago

Sorry, I don't have anything handy. You should be able to create a launch template (https://boto3.amazonaws.com/v1/documentation/api/1.26.85/reference/services/ec2/client/create_launch_template.html), and then specify the user data (https://docs.aws.amazon.com/eks/latest/userguide/launch-templates.html#launch-template-user-data) without specifying an AMI ID in the launch template.

The userdata should be something like this (untested):

MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="==MYBOUNDARY=="

--==MYBOUNDARY==
Content-Type: text/x-shellscript; charset="us-ascii"

#!/bin/bash

for cpunum in $(cat /sys/devices/system/cpu/cpu*/topology/thread_siblings_list | cut -s -d, -f2- | tr ',' '\n' | sort -un)
do
    echo 0 > /sys/devices/system/cpu/cpu$cpunum/online
done

--==MYBOUNDARY==--