Snakemake-Profiles / lsf

Snakemake profile for running jobs on an LSF cluster
MIT License
36 stars 22 forks source link

Add default project and remove default threads #43

Closed BEFH closed 2 years ago

BEFH commented 2 years ago

This pull request attempts to do two things:

The default project works perfectly, but the latest commit is failing the tests because I cannot figure out how to update the tests.

Linting and Blackening were successful. If you just want to merge default project, that's fine, but default_threads is no longer respected.

BEFH commented 2 years ago

@mbhall88, I have an uncustomized branch, so I can initiate a new pull request once we've addressed your other concerns.

In terms of threads not being respected, if I didn't specify threads in the rule definition, Snakemake put a default of 1 in the jobscript and that was never overridden by the cookiecutter default value. In other words, if I set default_threads to 2 and didn't specify threads:, it would default to 1, not 2.

I still don't understand how the unit testing is working, but if the removal of default_threads is wrong, then it won't be an issue other than lack of coverage of the work for project specification.

mbhall88 commented 2 years ago

@mbhall88, I have an uncustomized branch, so I can initiate a new pull request once we've addressed your other concerns.

Whatever is easiest for you.

In terms of threads not being respected, if I didn't specify threads in the rule definition, Snakemake put a default of 1 in the jobscript and that was never overridden by the cookiecutter default value. In other words, if I set default_threads to 2 and didn't specify threads:, it would default to 1, not 2.

I see. Ok, happy to remove it then.

I still don't understand how the unit testing is working, but if the removal of default_threads is wrong, then it won't be an issue other than lack of coverage of the work for project specification.

The removal of default_threads is not wrong, it's just you need to mock the output of calling the default_proj() function. Like in the line I permalinked above.

BEFH commented 2 years ago

@mbhall88, thanks. I'll try to figure out the tests on Friday. Do I also need to write tests for the new project feature and for my change I made after this PR that doesn't duplicate the -q flag if queue is specified in lsf.yaml?

mbhall88 commented 2 years ago

@mbhall88, thanks. I'll try to figure out the tests on Friday. Do I also need to write tests for the new project feature and for my change I made after this PR that doesn't duplicate the -q flag if queue is specified in lsf.yaml?

Updating an existing test with project-related settings would be sufficient. So one test that uses the default, and adding -P to one of the config files and making sure that overrides the default would be enough.

Regarding the change to the -q flag, it would be great if you could add something for that also. It should be as simple as copying another test and just adding a mock for a cookiecutter default and a config file with -q in it.

If you can't figure any of this out let me know and I'll try and add some tests when I get time.

mbhall88 commented 2 years ago

@BEFH this is a bit awkward now. While I wait for @leoisl to approve this PR, your recent commit to fix #47 has been pushed onto this PR.

@leoisl once you are happy with this PR, could you remove that commit on your branch and we might have to close this PR and open a new one that isn't on @BEFH's fork.

leoisl commented 2 years ago

First, sorry again for my delay on reviewing this. I am overloaded and needed some time off.

TLDR

I agree with removing the default threads option. The reasoning is: snakemake defaults threads to 1 if the user does not specify the threads directive, and there is no way for us to know if the user specified 1 thread or did not specify the threads directive.

Some tests

  1. LSF profile default threads = 1000, rule threads = 123

This is the submitted jobscript:

#!/bin/sh
# properties = {"type": "single", "rule": "concat_fastas", "local": false, "input": [], "output": ["output/all.fasta"], "wildcards": {}, "params": {}, "log": ["logs/concat_fastas.log"], "threads": 123, "resources": {"mem_mb": 1000, "disk_mb": 1000, "tmpdir": "/hps/nobackup/research/zi/leandro/temp"}, "jobid": 1, "cluster": {}}
cd /hps/nobackup/research/zi/leandro/snakemake_test && /hps/nobackup/research/zi/leandro/miniconda3/bin/python -m snakemake --snakefile '/hps/nobackup/research/zi/leandro/snakemake_test/Snakefile' 'output/all.fasta' --allowed-rules 'concat_fastas' --cores 'all' --attempt 1 --force-use-threads  --wait-for-files '/hps/nobackup/research/zi/leandro/snakemake_test/.snakemake/tmp.8pvg9a13' --force --keep-target-files --keep-remote --max-inventory-time 0 --nocolor --notemp --no-hooks --nolock --ignore-incomplete --skip-script-cleanup  --conda-frontend 'mamba' --wrapper-prefix 'https://github.com/snakemake/snakemake-wrappers/raw/' --latency-wait 10 --scheduler 'greedy' --scheduler-solver-path '/hps/nobackup/research/zi/leandro/miniconda3/bin' --default-resources 'mem_mb=max(2*input.size_mb, 1000)' 'disk_mb=max(2*input.size_mb, 1000)' 'tmpdir=system_tmpdir' --mode 2 && exit 0 || exit 1

and we can get both values for threads with:

self.job_properties.get("threads") = 123
CookieCutter.get_default_threads() = 1000

This is the case that works.

  1. LSF profile default threads = 1000, threads directive not specified for the rule

This is the submitted jobscript:

#!/bin/sh
# properties = {"type": "single", "rule": "concat_fastas", "local": false, "input": [], "output": ["output/all.fasta"], "wildcards": {}, "params": {}, "log": ["logs/concat_fastas.log"], "threads": 1, "resources": {"mem_mb": 1000, "disk_mb": 1000, "tmpdir": "/hps/nobackup/research/zi/leandro/temp"}, "jobid": 1, "cluster": {}}
cd /hps/nobackup/research/zi/leandro/snakemake_test && /hps/nobackup/research/zi/leandro/miniconda3/bin/python -m snakemake --snakefile '/hps/nobackup/research/zi/leandro/snakemake_test/Snakefile' 'output/all.fasta' --allowed-rules 'concat_fastas' --cores 'all' --attempt 1 --force-use-threads  --wait-for-files '/hps/nobackup/research/zi/leandro/snakemake_test/.snakemake/tmp.3c84_mlu' --force --keep-target-files --keep-remote --max-inventory-time 0 --nocolor --notemp --no-hooks --nolock --ignore-incomplete --skip-script-cleanup  --conda-frontend 'mamba' --wrapper-prefix 'https://github.com/snakemake/snakemake-wrappers/raw/' --latency-wait 10 --scheduler 'greedy' --scheduler-solver-path '/hps/nobackup/research/zi/leandro/miniconda3/bin' --default-resources 'mem_mb=max(2*input.size_mb, 1000)' 'disk_mb=max(2*input.size_mb, 1000)' 'tmpdir=system_tmpdir' --mode 2 && exit 0 || exit 1

and we can get both values for threads with:

self.job_properties.get("threads") = 1
CookieCutter.get_default_threads() = 1000

i.e. snakemake tells us that the rule must be run with 1 thread, but it does not tell us if the user said this explicitly (through the threads directive) or not (implicitly, by omitting the directive). As such, we can't decide if we should run with 1 thread (explicit usage) or with the LSF profile default threads (implicit usage).

A cumbersome solution

There is actually a way to fix this, but I think it is cumbersome and as 99.9% of the cases the default threads should be 1, I don't think we should go through the trouble of fixing. By looking at the jobscripts above, we can actually see the Snakefile that snakemake should run and the specific rule it should run. As such, we could actually parse the Snakefile and infer if the threads directive was specified or not. However, this gets complicated with things like included Snakefiles, modules, etc... It would be a lot of work for very little benefit, as I think almost all users would use default threads = 1

codecov-commenter commented 2 years ago

Codecov Report

Merging #43 (9469fc0) into master (34c3c4c) will increase coverage by 0.44%. The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #43      +/-   ##
==========================================
+ Coverage   88.67%   89.12%   +0.44%     
==========================================
  Files           6        6              
  Lines         468      478      +10     
  Branches       47       64      +17     
==========================================
+ Hits          415      426      +11     
+ Misses         46       45       -1     
  Partials        7        7              
Flag Coverage Δ
unittests 89.12% <100.00%> (+0.44%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
{{cookiecutter.profile_name}}/CookieCutter.py 76.00% <100.00%> (+4.00%) :arrow_up:
{{cookiecutter.profile_name}}/lsf_submit.py 89.15% <100.00%> (+0.69%) :arrow_up:
{{cookiecutter.profile_name}}/memory_units.py 100.00% <100.00%> (ø)