aws / aws-parallelcluster

AWS ParallelCluster is an AWS supported Open Source cluster management tool to deploy and manage HPC clusters in the AWS cloud.
https://github.com/aws/aws-parallelcluster
Apache License 2.0
828 stars 312 forks source link

Multi-Instance GPU (MIG) support #4119

Open rvencu opened 2 years ago

rvencu commented 2 years ago

Slurm supports MIG GPU devices, gres.conf would need to replace the File with MultipleFiles in that case.

I am interested to define a queue of resources with MIG so I can allocate fractions of GPUs to jobs

lukeseawalker commented 2 years ago

Hello @rvencu, this is currently not supported out of the box by ParallelCluster. I'm marking this as feature enhancement.