bihealth / snappy-pipeline

SNAPPY Nucleic Acid Processing in Python
MIT License
8 stars 4 forks source link

CI Coverage Status Documentation Status

SNAPPY - SNAPPY Nucleic Acid Processing Pipeline

Installation

Installation should be complete in 10 to 15 minutes.

In a nutshell:

# Download & preparation
git clone git@github.com:bihealth/snappy-pipeline.git
cd snappy-pipeline

# If you want to select a given branch, uncomment the following:
# git checkout <branch_name>

# WARNING- make sure that you are in your conda base environment

# Create conda environment "snappy_env" with all requirements:
mamba env create --file environment.yml -n snappy_env
conda activate snappy_env

# Install snappy in snappy_env environment
pip install -e ".[all]"

The dependency group all includes all optional dependencies, i.e. test (for running tests with pytest), dev (for formatting, linting, pre-commit hooks) and docs (for building the documentation with sphinx). If you only want to install the core dependencies, you can omit the [all] part, or choose any combination of the other groups.

See user installation if you just want to use the pipeline.

See developer installation for getting started with working on the pipeline code and also building the documentation.

Using GATK3

Some wrappers rely on GATK 3. GATK v3 is not free software and cannot be redistributed. Earlier, we had an internal CUBI conda server but this limits use of the wrapper for the general public. Now, the using pipeline steps must be activated as follows.

If you are a member of CUBI, you can use the central GATK download. Alternatively, you can download the tarball from the Broad archive.

$ ls -lh /fast/groups/cubi/work/projects/biotools/GenomeAnalysisTK-3.8-1-0-gf15c1c3ef.tar.bz2
-rw-rw---- 1 holtgrem_c hpc-ag-cubi 14M Dec 19  2019 /fast/groups/cubi/work/projects/biotools/GenomeAnalysisTK-3.8-1-0-gf15c1c3ef.tar.bz2

First, go to the pipeline directory where you want to run:

$ cd variant_calling

Explicitely create any missing conda environment

$ snappy-snake --conda-create-envs-only
[...]
12-27 17:18 snakemake.logging WARNING  Downloading and installing remote packages.
[...]

Find out which conda environments use GATK v3

$ grep 'gatk.*3' .snakemake/conda/*.yaml
.snakemake/conda/d76b719b718c942f8e49e55059e956a6.yaml:  - gatk =3

Activate each conda environment and register

$ for yaml in $(grep -l 'gatk.*3' .snakemake/conda/*.yaml); do
        environ=${yaml%.yaml};
        conda activate $environ
        gatk3-register /fast/groups/cubi/work/projects/biotools/GenomeAnalysisTK-3.8-1-0-gf15c1c3ef.tar.bz2
        conda deactivate
    done
Moving GenomeAnalysisTK-3.8-1-0-gf15c1c3ef.tar.bz2 to /home/holtgrem_c/miniconda3/envs/gatk3/opt/gatk-3.8

You are now ready to run GATK v3 from this environment.

Development Notes

Here, you can find the required layout for post-PR commit messages: