It4innovations / hyperqueue

Scheduler for sub-node tasks for HPC systems with batch scheduling
https://it4innovations.github.io/hyperqueue
MIT License
272 stars 21 forks source link

Allow configuring automatic resource detection in a granular way #729

Open Kobzol opened 1 month ago

Kobzol commented 1 month ago

Sometimes some part of automatic worker resource detection might be broken on a given cluster (e.g. it wrongly detects AMD GPUs). In that case, --no-detect-resources must be used, but that also disables the automatic detection of other resources, which might be otherwise working, which is annoying.

It would be nice if it was possible to selectively choose which resources should be detected automatically. For example like this:

$ hq worker start --detect-resources=gpus/nvidia,gpus/amd,memory,cpus

The default would be value shown above (currently the only resources HQ can detect), but users could selectively turn off some of them.