It4innovations / hyperqueue

Scheduler for sub-node tasks for HPC systems with batch scheduling
https://it4innovations.github.io/hyperqueue
MIT License
266 stars 20 forks source link
distributed-computing hpc rust task-graph

DOI

Tests

HyperQueue is a tool designed to simplify execution of large workflows (task graphs) on HPC clusters. It allows you to execute a large number of tasks in a simple way, without having to manually submit jobs into batch schedulers like Slurm or PBS. You just specify what you want to compute – HyperQueue will automatically ask for computational resources and dynamically load-balance tasks across all allocated nodes and cores. HyperQueue can also work without Slurm/PBS as a general task executor.

Documentation

If you find a bug or a problem with HyperQueue, please create an issue. For more general discussion or feature requests, please use our discussion forum. If you want to chat with the HyperQueue developers, you can use our Zulip server.

This image shows how HyperQueue can work on a distributed cluster that uses Slurm or PBS:

Architecture of HyperQueue deployed on a Slurm/PBS cluster

Features

Getting started

Installation

If you want to try the newest features, you can also download a nightly build.

Submitting a simple task

What's next?

Check out the documentation.

FAQ

You can find more frequently asked questions here.

HyperQueue team

We are a group of researchers working at IT4Innovations, the Czech National Supercomputing Center. We welcome any outside contributions.

References

HyperQueue poster at SC'21.

Acknowledgement