aws / aws-parallelcluster

AWS ParallelCluster is an AWS supported Open Source cluster management tool to deploy and manage HPC clusters in the AWS cloud.
https://github.com/aws/aws-parallelcluster
Apache License 2.0
817 stars 309 forks source link

Feature Request: Setup a Generic Resource (GRE) within SLURM for ec2 instance stores #5415

Open francisreyes-tfs opened 1 year ago

francisreyes-tfs commented 1 year ago

My workflow utilizes instance-level NVME-SSDs as a local scratch disk and therefore I utilize EphemeralVolume for SlurmQueues . It would be nice to have the total size of the EphemeralVolume be a GRE for that queue in slurm so that jobs can be submitted specifying the amount of space needed from the nodes in the queue.

https://slurm-dev.schedmd.narkive.com/1K61ZccM/how-to-setup-local-disk-as-gres

MikeKroell commented 5 months ago

I second this. The burst bufffer looks interesting to leverage NVMe as well. https://slurm.schedmd.com/burst_buffer.html