Closed sreichl closed 1 month ago
~Install the latest Snakemake version (document which exactly)~
8.16.0
~Run the tutorial locally with it~
~Get to run the tutorial using SLURM (executor plugin) on the CeMM cluster~
pip install snakemake-executor-plugin-slurm
N.B. Not installable via mamba/conda.
~Document what to do to get it to run~
mkdir snake_slurm
.
# Use spaces instead of tabs :'(
# Note that raw string arguments need double quotes (see slurm_extra)
# Remember that the slurm parition and qos must match on the CeMM cluster.
executor: slurm
jobs: 100
default-resources:
slurm_account: lab_bock
slurm_partition: tinyq
runtime: 30 # in minutes
mem: 2G
cpus_per_task: 1
nodes: 1
slurm_extra: "'--qos=tinyq'" # Note the extra quoting!
4. Run snake with the workflow profile specified to use slurm.
`snakemake --workflow-profile snake_slurm`
5. You can view running jobs using an adapted version of squeue that pads the name and comment fields so you can see all the information snakemake has added to clarify the jobs.
`squeue -u $USER -o %i,%P,%.10j,%.40k`.
> * [ ] in general for any executor plugin (akin to [here](https://github.com/epigen/mr.pareto#:~:text=location%20of%20your-,cluster%20profile,-(i.e.%2C%20the)) and [here](https://github.com/epigen/mr.pareto?tab=readme-ov-file#execution))
> * [ ] specifically SLURM at CeMM (will replace this [repo](https://github.com/epigen/cemm.slurm.sm)/[section](https://github.com/epigen/mr.pareto?tab=readme-ov-file#cemm-users) of README)
> * [ ] (Get to) Run the unsupervised_analysis pipeline with test data using SLURM on CeMM cluster
>
> * [ ] change the `partition` from `params` to `resources`
> * [ ] Document the necessary changes
> * [ ] Create issue for each MR.P module to address the change to bump it to v8 **or** one central in this repo with a list of all modules like here [switch all visualizations from panels to single plots #2](https://github.com/epigen/mr.pareto/issues/2)
> * [ ] <add above created issue(s) here>
> * [ ] adapt mr.pareto README accordingly to reflect Snakemake v8 usage in all regards
@burtonjake great progress! Please find out and document how to...
The slurm executor maps threads
and mem_mb
requirements to slurm (threads -> cpus_per_task).
https://snakemake.github.io/snakemake-plugin-catalog/plugins/executor/slurm.html#ordinary-smp-jobs
A minimal "cluster profile" to run workflows via SLURM at CeMM is:
# Remember that the slurm parition and qos must match on the CeMM cluster.
executor: slurm
jobs: 100
default-resources:
# slurm_account, partition, and runtime are required.
# match the CeMM intranet: https://cemmat.sharepoint.com/sites/IT-Resources/SitePages/Submitting-Slurm-Jobs.aspx
slurm_account: lab_bock
slurm_partition: tinyq
runtime: 120 # in minutes
slurm_extra: "'--qos=tinyq'" # Note the extra quoting!
It appears that you cannot specific default memory requirements etc here as they tend to conflict with that in existing workflows where the datatype of mem: 2G which is an integer (automatic conversion by snakemake even if you write "2G") does not match when defined as a string in workflow files, e.g., mem: config.get("mem", "1600"). Therefore this is all that is needed to get jobs to run on slurm.
This default config file is stored in ~/.config/snakemake/<config_name>/
where <config_name>
is for example cluster and can then be applied with snakemake --sdm conda --profile cluster
. Note that the config file has to be named config.<snakemake_supported_version>.yaml
. For example: config.v8+.yaml
.
here it seems like you can name and store it wherever you want, as long as you define the global variable SNAKEMAKE_PROFILE
accordingly. Maybe I am missing someting.
https://snakemake.github.io/snakemake-plugin-catalog/plugins/executor/slurm.html#using-profiles
in the main docs, they explain profiles (a concept new to me as I have been developing without them and my latest Snakemake version is 7.15.2): https://snakemake.readthedocs.io/en/stable/executing/cli.html#profiles
Seems so.
"Profile has to be given as either absolute path, relative path or name of a directory available in either /etc/xdg/snakemake or /home/jburton/.config/snakemake."
[Works: Relative Path] SNAKEMAKE_PROFILE=profiles/testprofile snakemake [Works: Name of profile in default dir] SNAKEMAKE_PROFILE=cluster snakemake etc
There's a potential 'gotcha' in that if don't install the full snakemake with bells and whistles then some of the MR PARETO workflows don't work out of the box. For example if you follow the snakemake tutorial to get snakemake on your system you won't end up with pandas. You can simulate this with mamba create -c conda-forge -c bioconda -n snakemake8-mini snakemake-minimal
.
My view is that if a workflow depends on a particular python package [to run the Snakemake file] then this should be documented. The snakemake way for this is to have a line at the top of the workflow.
conda:
"envs/global.yaml"
And to add the packages to envs/global.yaml
that you need. These are injected using conda before running the rest of the snakefile. For example:
(snakemake8-mini) [jburton@d001 envs]$ cat global.yaml
channels:
- conda-forge
- bioconda
- nodefaults
dependencies:
- pandas
with exact versions! (that's a MR.P requirement to increase reproducibility)
is this also a problem for full Snakemake installations? If no, what are the advantages of minimal installations?
Instructions
# install Snakemake 8.20.1
conda create -c conda-forge -c bioconda -n snakemake8_20_1 snakemake
# install SLURM executor plugin in Snakemake environment
# https://snakemake.github.io/snakemake-plugin-catalog/plugins/executor/slurm.html#
conda install snakemake-executor-plugin-slurm
CeMM SLURM repo:
v3.0.0 supports all Snakemake versions, below the relevant config
files:
LOG of Snakemake 8 bump:
workflow/profiles/default
tasks
squeue
slurm-executor-plugin
tested with v0.10.0workflow/profiles/default/config.yaml
partition
fromparams
toresources