bihealth / seasnap-pipeline

SeA-SnaP: (Se)q (A)nalysis (Sna)kemake (P)ipeline
1 stars 2 forks source link

RNA SeA-SnaP

RNA SeA-SnaP is a RNA-(Se)q (A)nalysis (Sna)kemake (P)ipeline tool and combines two tasks:

Both pipelines are based on Snakemake.

Outline

Concept

The focus of RNA SeA-SnaP is to be as easy to use, adapt and develop as possible. To this end, SeA-SnaP is divided in three main parts:

Finally there is also a directory with R markdown snippets for the DE sub-pipeline. Based on a configuration made in the config file, individual snippets can be assembled to generate a customized report. The splitting into snippets allows to easily develop, share and include different analyses of the results.

Quick-Start

Installation

After cloning this git repository:

git clone git@cubi-gitlab.bihealth.org:CUBI/Pipelines/seasnap-pipeline.git

all required tools and packages can be installed via conda.

Currently there are two separate conda environments, one for the mapping pipeline and one for the DE pipeline

Download and install them into new environments called sea_snap_mapping and sea_snap_de:

conda env create -f conda_env_mapping.yaml
conda env create -f conda_env_DE.yaml

The files conda_env_mapping.yaml and conda_env_DE.yaml are located in the main directory of the git repository. Each time before using SeA-SnaP, activate the environment with:

conda activate seasnap-mapping

or

conda activate seasnap-de

Finally, run the following command in the seasnap-de environment:

conda activate seasnap-de
Rscript install_r_packages.R

Running the pipeline

set up a working directory

Set up a working directory to store the results produced by the pipeline. (For CUBI projects create a project directory in the cluster under /fast/groups/cubi/projects/). To create a directory and copy required files for the configuration of your pipeline run:

path/to/git/sea-snap.py working_dir

This will create a directory at the location from where you are running the command called results_<year>_<month>_<day>/ and add config files for both pipelines, but you can customize this behaviour via the command line options (type sea-snap.py working_dir -h for help). Directory names you provide can include formatting instructions for pythons time package.

cd <dir_name> to the newly created working directory. SeA-SnaP also creates a symbolic link to the sea-snap.py script, so that you can from now on use ./sea-snap to run helpers or pipelines from the working directory. You should always run pipelines and helpers from there.


run the pipeline

The next steps depend on, whether you want to run:

The results of an analysis can also be exported to a new folder structure, e.g. to upload them to SODAR.


Development

Let's first introduce the general structure of SeA-SnaP.

As outlined above, the pipeline core functionality is separated from additional generic tools like the path handler (that handles where files are stored) and the pipeline configuration. The config file is loaded in Snakemake and its static parts (like parameter values) can be accessed in the pipeline rules. For other 'dynamic' parts of the configuration like file paths which are described by path patterns or the report- and contrast configuration tools are provided that can be used within the pipeline to access this information.

In addition, there is also a directory with report snippets for the DE pipeline, small pieces of R-Markdown code that run a single analysis step like producing a PCA plot. In the configuration file it can be set which snippets to use and in which order to assemble them into a full report.

\ Finally, there are some helper functions, that can be accessed via the ./sea-snap wrapper to e.g. automatically produce a covariate file or sample information. There are also folders external_scripts/, where scripts can be placed that may be used in the pipeline (although it is prefereable if small pieces of code are kept inside of the Snakemake file), and report/R_common/, where R functions can be put that are generic and may be used in several report snippets.

The pipelines can be easily extended.

See the separate sections for:


SeA-SnaP options

Available commands in the ./sea-snap wrapper:

helpers:

run pipeline:

Type ./sea-snap -h or ./sea-snap COMMAND -h for help.

Hints

understanding the reported number of reads (copied from old pipeline)

This has been inferred from single end data:

Help

Address questions to Patrick Pett (patrick.pett@bihealth.de)