elsasserlab / labcode

Utils to perform frequent data analyses in the lab.
GNU General Public License v3.0
0 stars 1 forks source link

elsasserlib on uppmax #18

Open simonelsasser opened 4 years ago

simonelsasser commented 4 years ago

@cnluzon it is possible to use the lib on uppmax as well I guess? It's now really simple and convenient to make a ChromHMM plot, violin plot etc from bigwig files, so I think we could add some R scripts to the pipeline that would run by group as defined in 'controls.tsv'.

You mentioned wanting to keep the pipeline in python as much as possible but I think for this kind of downstream analysis it is very useful to use code that people can easily adjust. E.g. if we had a few plotting scripts executed in the final stage of the pipeline, anyone with R knowledge could rerun them separately, include or exclude data or modify visualization by adapting the R script.

simonelsasser commented 4 years ago

...and basically we could try to standardise R scripts in the lab to have a certain standard for how they take input so that everyones scripts could be added to the pipeline or run manually on the same config files, e.g. 'controls.tsv'

cnluzon commented 4 years ago

Well the sticking to python preference is not an obligation, really. The underlying logic to me is that Snakemake is a python-based tool, and imports python source. So if there is a command-line tool that does exactly what we need, that's the best, but if there is not, python may fit more nicely to the rest of the code. And also reduces the amount of dependencies. This being said, it is an option that is open for discussion and I'd love to hear further considerations on what would be best in the context of a Snakemake (or any kind of) production pipeline.

But of course it is possible to include R code in downstream analysis. And there is even rpy2 which interfaces with R from python, if that ever was an issue. I need to think a little bit more about how this could be done in a way that it's useful, but I do have some immediate feelings about this:

  1. I'm not a big fan on making command line scripts with R, it feels a bit cumbersome to handle command-line parameters and so on (or maybe I have not mastered those yet ;) ).

  2. If eventually people in the lab would need to tweak the code, then it defeats the purpose of passing parameters through the command line, and it makes more sense to load a package like elsasserlib and run the functions you need, play with the code and eventually document what was done. If the functionality in the package is good enough, eventually the code needed should be a handful of function calls.

  3. we can still provide functionality to handle pipeline outputs or inputs: controls.tsv and the like. But I would rather put those in a package than in standalone scripts and only put in scripts what is strictly standard (no need to tweak to run it - just run directly from the pipeline).