It4innovations / hyperqueue

Scheduler for sub-node tasks for HPC systems with batch scheduling
https://it4innovations.github.io/hyperqueue
MIT License
277 stars 22 forks source link

Python API to Launch Workers with Srun #773

Closed manuel-g-castro closed 1 day ago

manuel-g-castro commented 1 day ago

Hello! I am building a script to be executed within a Slurm or PJM (Fujitsu's scheduler) allocation. I want to start one worker per node in the allocation, so the current working implementation in Slurm uses srun:

srun --overlap -N 1 -n 1 -c 112 $HQ_PATH/hq worker start --cpus=2x56 --manager slurm 

I am wondering if I can do the same with the Python API.

Kobzol commented 1 day ago

Hi, the easiest way to start workers is through Automatic allocation, which does indeed start one worker per node in the allocation by default. However, the Python API is currently only useful for defining jobs. It can also start a simple local cluster for experiments, but we don't currently expose the automatic allocation interface through the Python API. Therefore, automatic allocation either needs to be started from the CLI, or you can just run the CLI commands from Python.

We plan to add this in the future, but it's not currently a high priority item.

manuel-g-castro commented 1 day ago

Okayy! Thanks, @Kobzol !