Snakemake-Profiles / slurm

Cookiecutter for snakemake slurm profile
MIT License
123 stars 44 forks source link

A simple alternative profile with a single config file #73

Closed jdblischak closed 2 years ago

jdblischak commented 3 years ago

The profile in this repo is very comprehensive and handles many different use cases. However, I was having difficulty customizing it, especially when I was trying to replicate the behavior of the deprecated --cluster-config option (see the discussion in #25).

I've put together a simple profile that only requires you to download and edit a single config.yaml file:

https://github.com/jdblischak/smk-simple-slurm

If you're having difficultly specifying options to pass to sbatch, e.g. including the rule name in the log filename, specifying different time limits per rule, etc., please give it a try. My simple template can solve previous issues in this repo without resorting to using the deprecated --cluster-config option, e.g. #7, #24, #40, #42, #46

hans-vg commented 3 years ago

This is excellent!! I was having a hard time getting output/error logs working with the CookieCutter slurm configuration. It would default to slurm-######.out, even though it was configured in a seperate cluster_config.yml. Your config worked out of the box with minimal configuration.

One question: is there anyway to change the log filenames for {wildcard}? ls -1 logs/trimmomatic_pe/ trimmomatic_pe-sample=FA,unit=rep3-2687041.out trimmomatic_pe-sample=FB,unit=rep2-2687040.out

I would like to not use "=" or "," in the filenames, so it doesn't require escaping to view. IE. less logs/trimmomatic_pe/trimmomatic_pe-sample\=FA\,unit\=rep3-2687041.out

Is there anyway to modify this behavior?

jdblischak commented 3 years ago

This is excellent!!...Your config worked out of the box with minimal configuration.

@hans-vg Wonderful! The goal was minimal configuration, so I'm glad you were able to configure it quickly.

I would like to not use "=" or "," in the filenames, so it doesn't require escaping to view.

The = and , come directly from the {wildcards} value that Snakemake substitutes. The only way to modify this AFAIK would be to write your own Python function to reformat it. You could probably update the function format_wildcards in the file slurm_utils.py in this repository to do this. That function gets called by slurm-submit.py.

But of course I would recommend you stick with the simple route. Yes those = and , can be annoying when you're trying to look at the log files, but the alternative is having to maintain multiple Python scripts to submit your jobs to Slurm.

percyfal commented 2 years ago

Hi @jdblischak, thanks for the post. Your repo looks really neat! Is there any functionality that could be merged with this profile? In any case, I can add a link in the README to your repo if you want, so that users don't have to browse the issue list to find it :)

jdblischak commented 2 years ago

thanks for the post. Your repo looks really neat!

@percyfal Thanks! My initial inspiration came from your repo, so thank you for your efforts to document and maintain the official snakemake slurm profile.

Is there any functionality that could be merged with this profile?

The biggest difference in approach between the 2 profiles is how to specify default and per-rule resources. Your profile uses --cluster-config, and mine uses a combination of default-resources and per-rule resources. The latter more closely couples the Snakemake file itself to the original scheduler that was used, but the work for a potential user that wants to execute the pipeline with a different scheduler would be similar in both cases. Either they'd to need to perform a search-and-replace of the JSON file passed to --cluster-config or a search-and-replace of the Snakemake file itself.

In any case, I can add a link in the README to your repo if you want, so that users don't have to browse the issue list to find it :)

That would be super appreciated. Thanks!

percyfal commented 2 years ago

@jdblischak I have added a link to your repo in #84. Also, I point out that the use of cluster-config is discouraged and that resources should be configured with snakemake CLI arguments in the profile configuration; in the end I don't think the difference between the two profiles is that great. I will still keep support for cluster-config for now, but probably deprecate it soon enough.

One lingering issue that I have to deal with is increasing job submission speed as scalability is currently hampered.

jdblischak commented 2 years ago

@percyfal Thanks so much!

One lingering issue that I have to deal with is increasing job submission speed as scalability is currently hampered.

I'm happy to help test this out. If you make some changes that affect the job submission speed, please ping me and I can run my job submission benchmark on the updated profile.