Open npf opened 7 months ago
Hi!
Does that mean hyperqueue can auto-allocate on several HPC clusters with different submission frontends?
It does, although to actually support two different clusters, you'll need to run the HQ server on a place that is accessible from both clusters (and their compute nodes) through TCP/IP, which might be a bit challenging. Also, if you want to use automatic allocation for this, it's a bit more complex (see below).
Technically speaking, I do not see how to configure the necessary remote access to submission frontends. In the code, the allocation function calls either sbatch or qsub directly, if I'm correct.
It does indeed call sbatch
/qsub
directly. We have been thinking about providing some way to customize this mechanism, but we haven't seen any use-case for that yet. A simpler solution/workaround might be to provide a proxy, that will reroute the sbatch/qsub calls from the node where HQ server is deployed to the corresponding login nodes/frontends. You could probably write e.g. a simple Python program that will act as sbatch/qsub and allow communicating with remote systems.
If you had a use-case for this, we could also implement e.g. a JSON-based auto allocation backend, which could implement the autoallocation using any mechanism it would need.
Should both sbatch or qsub commands be available on the machine where the hyerqueue server runs?
Currently, yes, if you want to use auto-allocation (or you can use a proxy as described above).
If you don't use automatic allocation, you can also just provide the computational resources to HQ manually, by running sbatch/qsub on the corresponding clusters, and then redirecting the HQ workers to the IP address of the HQ server. In that case the server does not need to know anything about sbatch/qsub.
Hi,
In https://it4innovations.github.io/hyperqueue/stable/deployment/allocation/, the documentation says :
Does that mean hyperqueue can auto-allocate on several HPC clusters with different submission frontends?
Technically speaking, I do not see how to configure the necessary remote access to submission frontends. In the code, the allocation function calls either
sbatch
orqsub
directly, if I'm correct.Should both
sbatch
orqsub
commands be available on the machine where the hyerqueue server runs?Thanks.