LightForm-group/UoM-CSF-matflow

This repository contains information about running MatFlow on the Computational Shared Facility (CSF) at the University of Manchester.

Included are:

A software definition file
A set of example task schemas
Some example workflows
Some Jupyter notebooks demonstrating use of the MatFlow API on completed workflows. Click the Binder link above and navigate to /workflows/jupyter_notebooks to explore these.

Installation of MatFlow on the CSF

Add export HDF5_USE_FILE_LOCKING=FALSE to your .bash_profile. This is to allow MatFlow to work on the scratch filesystem. See this issue.
To allow access to the internet so we can install MatFlow, first load the proxy module: module load tools/env/proxy2. We only need to do this once, when installing Python packages from the web. However, if you want to use MatFlow's cloud archiving facility (i.e. copying your workflow results to Dropbox), you will need to make sure the proxy module is always loaded. You can do this by adding the module load line to a file .modules in your home directory.
Load Anaconda to give us access to pip:

module load apps/binapps/anaconda3/2019.07
Now install MatFlow and some extensions, using pip. This may take several minutes. You may receive a warning about the scripts path not being on your PATH (see next step).

pip install --user matflow matflow-damask matflow-formable matflow-mtex matflow-defdap
Make sure the following path is on your $PATH environment variable: ~/.local/bin. This can be done in your .bash_profile file like this: PATH=$PATH:~/.local/bin.
Run matflow validate to check the installation (you may get a warning about the MTEX extension - this is fine)
Add the software.yml and task_schemas.yml files from this repository to your MatFlow software sources and task schemas sources respectively. These files are already in the jf01 group shared RDS space under the path /mnt/eps01-rds/jf01-home01/shared/matflow. To register them with MatFlow, edit the MatFlow config.yml file, which, after running matflow validate for the first time, resides here: ~/.matflow/config.yml (i.e. in your home directory). Add the following path to the task_schema_sources list in the config file:

/mnt/eps01-rds/jf01-home01/shared/matflow/task_schemas.yml

...and add the following path to the software_sources list in the config file:

/mnt/eps01-rds/jf01-home01/shared/matflow/software.yml
Now run matflow validate again. This time there should be no warnings.

Note: when connecting to the CSF to submit workflows, do not use X11 forwarding (the -X flag of the ssh command).

Setting default scheduler options for preparation/processing jobs

Often, preparation and processing jobs are not computationally expensive, and can be run as serial jobs in the short queue on the CSF. We can set default scheduler options for the preparation and processing jobs by adding this to the MatFlow config file:

default_preparation_run_options:
  l: short

default_processing_run_options:
  l: short

default_iterate_run_options:
  l: short

In this case, all preparation and processing jobs will use the short queue by default. This can be overidden from within a workflow if necessary.

Submitting a workflow

Run the command matflow go workflow.yml where workflow.yml is the name of the workflow file.

Stopping an active workflow

Run the command matflow kill workflow/directory/path where workflow/directory/path is the path to the workflow directory that is generated by MatFlow. This command will delete all running and queued jobs associated with the workflow.

Setting up Dropbox archiving

We can get MatFlow to copy (a subset) of the workflow files to a Dropbox account after the workflow completes.

Add an "archive location" to the MatFlow config file. An archive location looks like this:
```
archive_locations:
  dropbox:
    cloud_provider: dropbox
    path: /sims
```
In this case, this tells MatFlow to use the path /sims inside your Dropbox directory structure. The path you specify here must exist.
You can then add an extra key to any of your workflow files to tell MatFlow to use this archive location: archive: dropbox. If you want to exclude certain files, you can also add a key archive_excludes to your workflow, which is a list of glob-style patterns to exclude. Task schemas can also include archive_excludes.

The first time you submit a workflow that uses this archive location, you will be prompted to authorize hpcflow to connect to your Dropbox account.

Archive after workflow completion

As of MatFlow v0.2.21, you can run an archive on a complete workflow like this: matflow archive /path/to/workflow/directory dropbox. In this case, we choose the archive named dropbox in our config.yml file. Any archive defined in the config.yml file can be chosen. File patterns will be excluded from the archive according to the archive_excludes patterns in the corresponding task schema definitions, plus any archive_excludes patterns included in the original workflow submission.

Metadata

In general, you can associate arbitrary metadata with a workflow in the workflow YAML file by using the metadata key. Additionally, as of MatFlow v0.2.21, you can specify default metadata that should be applied to all generated workflows. Default metadata is merged with any metadata specified in the workflow YAML file; a metadata item specified in the workflow YAML file will overwrite the same key specified in default_metadata.