creinders / hydra-slurm-rich-launcher

A rich, visual interface for easily starting and monitoring your Hydra applications on SLURM clusters
MIT License
14 stars 1 forks source link
cli command-line hydra launcher monitoring python rich scheduler slurm visualization

Hydra Slurm Rich Launcher

A rich, visual interface for easily starting and monitoring your Hydra applications on SLURM clusters.

Installation

The Hydra Slurm Rich Launcher can be installed via pip:

pip install hydra-slurm-rich-launcher --upgrade
Alternative installation methods ### Locally ``` git clone git@github.com:creinders/hydra-slurm-rich-launcher.git cd hydra-slurm-rich-launcher poetry install ```

Quick Start

Define your configuration in config.yaml:

defaults:
  - override hydra/launcher: slurm_rich
hydra:
  launcher:
    partition: <SLURM_PARTITION>

task: 1

Implement your Hydra app in my_app.py:

import hydra

@hydra.main(config_path=".", config_name="config", version_base="1.3")
def my_app(cfg) -> None:
    print(f"Task: {cfg.task}")

if __name__ == "__main__":
    my_app()

Starting the app with task=1,2,4 will launch three jobs with different configurations:

python my_app.py task=1,2,4 --multirun

example

Please see the Hydra documentation for details regarding the configuration and multi-run.

Scalability

Lots of run? No problem! Hydra Slurm Rich Launcher smartly organizes all of your runs.

Scalability

Restarts

Easily monitor the status of your jobs and swiftly restart any failed runs.

Restarts

Parameters

The Hydra Slurm Rich Launcher has the following parameters.

slurm_query_interval_s: 15 #  Query update interval from SLURM controller
filter_job_ids: null # Filter specific jobs from the job array, separated by comma (e.g., "1,4"), that should not be executed
retry_strategy: 'prompt'  # Defines job retry strategy. 'prompt': will ask the user, 'never': never restarts, and 'always': restarts the runs automatically
max_retries: 3 # Maximum retry attempts
le_mode: 'auto'  # Low energy mode settings. The low energy mode disables all animations and can be turned on if the cpu-usage must be minimized. Values are: 'on', 'off', and 'auto'. 'auto' will turn on the low energy mode if the environment variable HYDRA_SLURM_PROGRESS_LE_MODE is set.

submitit_folder: ${hydra.sweep.dir}/.submitit/%j
timeout_min: 60
cpus_per_task: null
gpus_per_node: null
tasks_per_node: 1
mem_gb: null
nodes: 1
name: ${hydra.job.name}
partition: null
qos: null
comment: null
constraint: null
exclude: null
gres: null
cpus_per_gpu: null
gpus_per_task: null
mem_per_gpu: null
mem_per_cpu: null
account: null
signal_delay_s: 120
max_num_timeout: 0
additional_parameters: {}
array_parallelism: 256
setup: null

License

Hydra Slurm Rich Launcher is licensed under MIT License.

Credits

This package was inspired by and extends the capabilities of the hydra-submitit-launcher. We gratefully acknowledge the developers of hydra-submitit-launcher and Hydra for their contributions to the open-source community.