Open BerndDoser opened 3 months ago
Thank you for opening your first issue here! 🛠
To integrate Slurm with Flyte, you can create a Flyte agent that interacts with the Slurm scheduler using its REST API. This approach leverages Flyte's extensibility and allows for efficient task submission to HPC resources managed by Slurm.
Slurm is a widely used workload management system in many HPC (High-Performance Computing) compute clusters. It plays a vital role in efficiently allocating compute resources, running work on these allocated resources, and overseeing a queue of pending work.
Integrating Slurm with Flyte presents a strategic opportunity to greatly enhance the utilization of Flyte workflows on HPC clusters, providing many advantages.
The aim is the implementation of a Flyte agent that facilitates task submission to HPC resources using the Slurm scheduler.
Typically, users interact with Slurm through its command-line interface (CLI). For instance, the sbatch
command submits a job script for later execution. An optional Slurm daemon also offers a REST API for interacting with the Slurm system.
I don't know of anything comparable.
I am available to offer support using Slurm and to test the Flyte agent. https://github.com/JBris/slurm-rest-api-docker can be used for testing the Slurm CLI and the Slurm REST API.
Sources:
This is an AI-generated response and your feedback is appreciated! Please leave a 👍 if this is helpful and 👎 if it is not.
Mention @runllm to ask follow-up questions. See runllm.com for more!
@BerndDoser , would this be something that you'd be interested to collaborate? The guide on how to develop a new agent has a lot of details, including how to develop agents locally (so that the dev cycle is tight).
Hi @eapolinario, thank you for your interest in the feature request. I am generally interested in collaborating. For example, I can set up a Slurm container to test such a feature. I am currently on vacation but will be back at the beginning of September.
Please let us know when you try.
Motivation: Why do you think this is important?
Slurm is a widely used workload management system in many HPC (High-Performance Computing) compute clusters. It plays a vital role in efficiently allocating compute resources, running work on these allocated resources, and overseeing a queue of pending work.
Integrating Slurm with Flyte presents a strategic opportunity to greatly enhance the utilization of Flyte workflows on HPC clusters, providing many advantages.
Goal: What should the final outcome look like, ideally?
The aim is the implementation of a Flyte agent that facilitates task submission to HPC resources using the Slurm scheduler.
Typically, users interact with Slurm through its command-line interface (CLI). For instance, the
sbatch
command submits a job script for later execution. An optional Slurm daemon also offers a REST API for interacting with the Slurm system.Describe alternatives you've considered
I don't know of anything comparable.
Propose: Link/Inline OR Additional context
I am available to offer support using Slurm and to test the Flyte agent. https://github.com/JBris/slurm-rest-api-docker can be used for testing the Slurm CLI and the Slurm REST API.
Are you sure this issue hasn't been raised already?
Have you read the Code of Conduct?