aws / aws-parallelcluster

AWS ParallelCluster is an AWS supported Open Source cluster management tool to deploy and manage HPC clusters in the AWS cloud.
https://github.com/aws/aws-parallelcluster
Apache License 2.0
826 stars 312 forks source link

Feature Request: CustomAMI per SlurmQueue.ComputeResource #5853

Open jkirsch-LF opened 9 months ago

jkirsch-LF commented 9 months ago

Our team has multiple queues with mixed InstanceTypes and InstanceSize and we're trying to switch from startup scripts to CustomAMIs. The problem we're running into is that AMIs are specific to a combination of InstanceType and InstanceSize, making it seemingly impossible to apply CustomAMIs to mixed queues.

Is it possible to apply a CustomAMI per ComputeResource?

francesco-giordano commented 9 months ago

Hi jkirsch-LF, currently is not possible but I will add these as a feature request.

However can you share more about your use case since usually the AMI does not depend on the InstanceSize.

I do not know if it is applicable to your case but as a workaround you can: install everything in a single AMI and then use it in all the queue or install the common software by queue.

jkirsch-LF commented 9 months ago

You are correct that AMIs usually do not depend on Instance Size, however, the pcluster build-image config file does require an InstanceType+InstanceSize combination be provided.

https://github.com/aws/aws-parallelcluster/blob/release-3.0/cli/tests/pcluster/schemas/test_imagebuilder_schema/test_imagebuilder_schema/imagebuilder_schema_dev.yaml#L18

Even if size was removed, we have queues consisting of different p instance types and another queue of g instance types for different work. Applying a pcluster baked AMI per compute resource would be helpful in this scenario.

We can certainly make our own AMI using traditional methods, but it’s a pretty cumbersome process.

enrico-usai commented 9 months ago

Hi @jkirsch-LF instance-type passed at build-image time doesn't impact the AMI you're going to create. I mean, you can create the AMI using a t2.medium but in the cluster you can use the same AMI to instantiate a c5n.18xlarge or the instance type you prefer.

The instance-type you're passing to the build-image command is just specifying the instance used to create the AMI, a larger instance will speed up AMI creation process, but it doesn't have other impacts other than this. The same is true for the size.

Let us know if it helps.