GoekeLab / bioinformatics-workflows

minimal example implementations for bioinformatics workflow managers
MIT License
251 stars 43 forks source link

Extended Snakemake workflow #2

Closed johanneskoester closed 2 years ago

johanneskoester commented 3 years ago

Thanks a lot for reaching out regarding improvement of the example Snakemake workflow. I have extended the workflow to better show the most important features of Snakemake.

Note that I have also added two additional steps:

  1. A step to download the reference transcriptome. This shows how Snakemake integrates such shared data retrieval and processing, which has the advantage of generating increased transparency. Usually people fear the redundant storage and computation cost, but this is not an issue with Snakemake because it supports caching of results between workflows (even across users, if a lab-wide cache is set up), based on its ability to precompute code and software based hashes for output files before actually generating them.
  2. A step to create a plot of the TPM values generated by Salmon. This shows how Snakemake integrates with Jupyter notebooks, allowing to e.g. initially generate such plots interactively, and later reuse the notebook which is automatically stored and generalized by Snakemake.

I know, this addition of steps is probably an issue, because it would require you to also extend the other workflows in this regard, which is, perhaps, not in all cases possible or easy. On the other hand, leaving out these steps makes it impossible to show two central strengths of Snakemake, which are important for steps that are by no means uncommon in data analysis. In the end, bioinformatics is more than just invoking some shell commands, and a solution that integrates diverse types of steps can often be desirable.

johanneskoester commented 3 years ago

Btw. the notebook will be properly rendered by Github after merging the PR.

johanneskoester commented 2 years ago

Thanks a lot! I have modified the readme and the config example to use them. Further, I have added a github action to test, format and lint the workflow.