AfshinLab / BLR

MIT License
5 stars 0 forks source link

Update blr run command to allow all snakemake arguments out-of-the-box #57

Closed pontushojer closed 2 years ago

pontushojer commented 3 years ago

This PR updates the run command to use the snakemake executable directly instead of the previous API call. The benefit of doing this is that we can use snakemake's internal argument handling without having to mapped all argument we would like to our own parser. This is much more flexible and avoids things like https://github.com/NBISweden/BLR/commit/28253e48fd855b662a40c83c114b321c1c64407b to add access addition functionality in a roundabout way. It also avoids the complications of how to handle certain arguments, just look at the complexity of the latest version of snakemake.main.

I have also been rethinking cluster execution since this could be speed up thing like mapping of bins with ema (instead of mapping 5 bins in parallel on a single 20 core node, 100 4-cores nodes could be used). This would also be quite hard to integrate without the update presented herein.

HSiga commented 3 years ago

Tested and works well! It could be nice to add some hints and/or brief descriptions for the commonly used snakemake options, (-n, --unlock, the positional argument for the wanted targets).

pontushojer commented 2 years ago

@HSiga Thanks for the comments!

Since the idea is that you should only really need to run blr run to execute the full pipeline I am hesitant to add these options in the wrapper. In the normal case you don't need to define targets and it the directory is locked snakemake will inform on that and suggest you to use --unlock. Also since Snakemake might add or update options I am hesitant to add more than are absolutely necessary.

I added the -n/--dry-run in the documentation however since its really useful. I also updated the docs to make it more clear how to run and were to find info about Snakemake arguments. The new help messeage is seen below. What do you think?

$ blr run -h
usage: blr run [-h] [-c N] [--anew] [--no-use-conda]
               [snakemake_args [snakemake_args ...]]

Run the BLR pipeline.

This is a small wrapper around Snakemake that sets some default parameters.

To run full pipeline use:

    $ blr run

To do a dry-run (not executing anything, only showing what would be done) use:

    $ blr run -n

For additional arguments related to Snakemake run '$ snakemake -h' or look at the official
documentation at https://snakemake.readthedocs.io/en/stable/executing/cli.html.

optional arguments:
  -h, --help       show this help message and exit
  -c N, --cores N  Run on at most N CPU cores in parallel. Default: 16 (all
                   available cores).
  --anew           Use if initializing from previous analysis run(s).
  --no-use-conda   Skip passing argument '--use-conda' to snakemake.

snakemake arguments:
  snakemake_args   Arguments passed to snakemake. For info about snakemake
                   options run 'snakemake --help'.
HSiga commented 2 years ago

Looks great! I think it is a good practice to test with a dry-run before running the whole analysis, thanks for adding that! I agree that --unlock is less important, however, the positional argument for the target might be good to add, as the user can be interested in running the pipe until specific outputs (before the merging case for instance).

pontushojer commented 2 years ago

My thinking with that is that we should define specific targets rules for that purpose. E.g. blr run prep_merge to run the workflow up until ready for merging. But that is a bit outside the scope here.

Also if you need to define specific targets then you need to basically know the full workflow by heart, meaning that you probably know a bit about snakemake allready. I think that having the reference to snakemake -h for those highly specific options is enough.

pontushojer commented 2 years ago

Rebased to fix errors.

HSiga commented 2 years ago

It's not necessary to memorize the pipeline, as the wanted outputs can be figured out from the dry-run. However, blr run prep_merge sounds like a good idea, and it might be sufficient for the need for 'targeted run'.

pontushojer commented 2 years ago

It's not necessary to memorize the pipeline, as the wanted outputs can be figured out from the dry-run. However, blr run prep_merge sounds like a good idea, and it might be sufficient for the need for 'targeted run'.

True. I guess what I ment that these thing are outside of basic usage. Setup for merging runs should however be simpler to run. We could also work more on the documentation relating to usage.