Flywheel HPC Client

The HPC Client is a self-service solution that allows Flywheel jobs and gears to run on a High Performance Computing environment. Use on-premise hardware that's already available for highly-concurrent scientific workloads!

Architecture
HPC Types
Minimum Requirements
Getting Started
FAQs

Architecture

hpc-client-architecture 20210726

HPC Types

The client, also called Cast, can support several queue mechanisms out of the box. Flywheel, however, currently only provides support for Slurm. If you require assistance with other schedulers, contact Flywheel.

Common name	Code name
IBM spectrum LSF	`lsf`
Oracle / Sun Grid Engine	`sge`
Slurm	`slurm`

If your site uses one of these, it may well just need a config file to get running.
Otherwise, some light python development will be required.

Minimum Requirements

Reference this article for the minimum software and computing requirements of the system where the HPC Client will be installed.

Getting Started

Before using Cast, you need to decide how it will run on your cluster.
Choose an integration method and keep it in mind for later. This sets how frequently Cast with look for, pull, and queue hpc jobs to your HPC from your Flywheel site.
It is strongly recommended that you make a private GitHub repo to track your changes.
This will make Cast much easier to manage.
Perform the initial cluster setup. If you are unfamiliar with
singularity, it is recommended that you read--at a minimum--SingularityCE's introduction
and quick start guides.
Create an authorization token so Singularity and Flywheel can work with each other.
If your queue type is not in the above table, or is sufficiently different, review the guide for adding a queue type.
Collaborate with Flywheel staff to install the Flywheel engine in your HPC repo. They will also configure the hold engine on your Flywheel site to ensure that other engines do not pick up gear jobs that are tagged with "hpc".
Complete the integration method you chose in step one.
Confirm Cast is running regularly by monitoring logs/cast.log and the Flywheel user interface.
Test and run your first HPC job tests in collaboration with Flywheel. It is recommended that you test with MRIQC (non-BIDS version), a gear that's available from Flywheel's Gear Exchange. Note: as of 11 May 2022, Flywheel will have to change the rootfs-url (location of where the Docker image resides) for any gears installed from the Gear Exchange. For more about how Cast uses a rootfs-url, see Background/Motivation of this article.
Enjoy!

FAQs

How do I update the HPC Client to the latest release?
How do I update my Flywheel engine?
How do I use GPUs on my Slurm Cluster?

How do I set ram and cpu settings for my job?
Starting in version 2.0.0, the HPC Client will perform the following checks for setting ram and cpu settings: 1. Was `scheduler_ram` or `scheduler_cpu` set in the gear config when the Flywheel job was launched? If so, use this. The gear must have these as config variables to set them. See table below for formatting. 2. If no setting was found for that specific job, check the `settings/cast.yml` file for these variables. Setting this will apply to HPC jobs submitted by the HPC Client. Only step 1. overrides this. 3. If the setting is still not found, then use the default one set for that specific scheduler type (e.g., Slurm). This is hardcoded and should not be changed. ### Formatting guide for variables 'scheduler_ram' and 'scheduler_cpu' | scheduler/cluster | RAM | CPU | | ----------------- | ------------------ | ------------------------ | | Slurm | '8G' | '8' | | LSF | 'rusage[mem=4000]' | '1' | | SGE | '8G' | '4-8' (sets CPU range) |

How do I use a custom script template for the jobs submitted to my HPC?

The HPC Client creates a shell script (`.sh`) for every job that is submitted to your HPC through your scheduler (e.g., Slurm). It creates this using a default script template for the type of scheduler on your HPC. If you would like to use a custom one, you can do so by using the `script` variable in the `settings/cast.yml` file. It is not recommended to edit the default templates in the source code (e.g., `src/cluster/slurmpy`)

How do I send my jobs to a specific partition on my HPC?

When you use a custom script template, you can set the partition(s) to which all your jobs will be sent. For example, if your scheduler is Slurm, you can add the following line in your custom script template: ``` #SBATCH --partition=, ``` Example: ``` #SBATCH --partition=gpu-1,gpu-2 ```

How do I check my version of the HPC Client?

The version of the HPC Client is in `src/__init__.py` under the variable `__version__`. This was not available prior to 2.0.0.

flywheel-io / hpc-client

readme

Flywheel HPC Client

Table of Contents

Architecture

HPC Types

Minimum Requirements

Getting Started

FAQs

How do I update the HPC Client to the latest release?

How do I update my Flywheel engine?

How do I use GPUs on my Slurm Cluster?