flywheel-io / hpc-client

Cast jobs to an on-premise HPC.
MIT License
3 stars 4 forks source link

Flywheel HPC Client

The HPC Client is a self-service solution that allows Flywheel jobs and gears to run on a High Performance Computing environment. Use on-premise hardware that's already available for highly-concurrent scientific workloads!

Table of Contents

Build Status

Architecture

hpc-client-architecture 20210726

HPC Types

The client, also called Cast, can support several queue mechanisms out of the box. Flywheel, however, currently only provides support for Slurm. If you require assistance with other schedulers, contact Flywheel.

Common name Code name
IBM spectrum LSF lsf
Oracle / Sun Grid Engine sge
Slurm slurm

If your site uses one of these, it may well just need a config file to get running.
Otherwise, some light python development will be required.

Minimum Requirements

Reference this article for the minimum software and computing requirements of the system where the HPC Client will be installed.

Getting Started

  1. Before using Cast, you need to decide how it will run on your cluster.
    Choose an integration method and keep it in mind for later. This sets how frequently Cast with look for, pull, and queue hpc jobs to your HPC from your Flywheel site.

  2. It is strongly recommended that you make a private GitHub repo to track your changes.
    This will make Cast much easier to manage.

  3. Perform the initial cluster setup. If you are unfamiliar with
    singularity, it is recommended that you read--at a minimum--SingularityCE's introduction
    and quick start guides.

  4. Create an authorization token so Singularity and Flywheel can work with each other.

  5. If your queue type is not in the above table, or is sufficiently different, review the guide for adding a queue type.

  6. Collaborate with Flywheel staff to install the Flywheel engine in your HPC repo. They will also configure the hold engine on your Flywheel site to ensure that other engines do not pick up gear jobs that are tagged with "hpc".

  7. Complete the integration method you chose in step one.
    Confirm Cast is running regularly by monitoring logs/cast.log and the Flywheel user interface.

  8. Test and run your first HPC job tests in collaboration with Flywheel. It is recommended that you test with MRIQC (non-BIDS version), a gear that's available from Flywheel's Gear Exchange. Note: as of 11 May 2022, Flywheel will have to change the rootfs-url (location of where the Docker image resides) for any gears installed from the Gear Exchange. For more about how Cast uses a rootfs-url, see Background/Motivation of this article.

  9. Enjoy!

FAQs

How do I use a custom script template for the jobs submitted to my HPC? The HPC Client creates a shell script (`.sh`) for every job that is submitted to your HPC through your scheduler (e.g., Slurm). It creates this using a default script template for the type of scheduler on your HPC. If you would like to use a custom one, you can do so by using the `script` variable in the `settings/cast.yml` file. It is not recommended to edit the default templates in the source code (e.g., `src/cluster/slurmpy`)
How do I send my jobs to a specific partition on my HPC? When you use a custom script template, you can set the partition(s) to which all your jobs will be sent. For example, if your scheduler is Slurm, you can add the following line in your custom script template: ``` #SBATCH --partition=, ``` Example: ``` #SBATCH --partition=gpu-1,gpu-2 ```
How do I check my version of the HPC Client? The version of the HPC Client is in `src/__init__.py` under the variable `__version__`. This was not available prior to 2.0.0.