Snakemake pipeline for running IRMA. Designed for Influenza and RSV Illumina Sequencing.
Note: This pipeline is not ready for general use, while it generally works, it is very brittle.
ggplot2
dplyr
stringr
tidyr
cowplot
gridExtra
furrr
miniconda
: https://docs.conda.io/en/latest/miniconda.htmlmamba
:
conda install -n base -c conda-forge mamba
git clone --depth 1 https://github.com/ammaraziz/wfi
cd wfi
mamba env create -f conda.yaml
wfi
environment:
conda activate wfi
Install miniconda
: https://docs.conda.io/en/latest/miniconda.html
Install snakemake
: https://snakemake.readthedocs.io/en/stable/getting_started/installation.html
conda install -n base -c conda-forge mamba
mamba install -c bioconda -c conda-forge snakemake-minimal python=3.9
Install cutadapt
and biopython
:
mamba install -c bioconda -c conda-forge cutadapt biopython bbmap
Install R (>3.6 should work) and R packages:
mamba install -c conda-forge r-base
mamba install -c r r-ggplot2 r-dplyr r-tidyr r-cowplot r-gridExtra r-optparse r-furrr
Note: installing r packages through conda is troublesome for some, if so install manually in R.
Install custom verison of IRMA which contains the RSV module:
mamba install -c ammaraziz irma
Finally, download this repository and store in your /bin/
To use the pipeline, follow these steps:
config.yaml
and modify as appropriate:Params | Values | Information |
---|---|---|
input_dir | path | input directory - location of the raw fastq files for input |
output_dir | path | output directory - location to output results - same dir where the config sits |
second_assembly | True /False |
if you suspect mixtures, set to True . It will increase run time substantially |
subset | True /False |
if you are only sequencing HA/NA/MP set this to True else leave as False |
trim_prog | standard /tile |
Trimming program to use, tile (bbduk) or standard (cutadapt) |
trim_org | h1 /h3 |
Influenza only, Flu subtype |
technology | illumina /ont /pgm |
seq technology used, will change the module by IRMA |
Check snakemake is installed, if an error is produced it means snakemake was not found or it is not installed.
% snakemake --version
% 5.10.0
Test the pipeline, this will output all the commands that will be run. Look for errors (red).
% snakemake -nq
Run the pipeline, with option -j
to specify number of cores to use.
% snakemake -j 8
Pipeline will output correctly formatted names located in:
{output_dir}
/assemblies/rename/
{output_dir}
/assemblies/rename/type/FLU{A|B}
{output_dir}
/assemblies/{sampleID}
/
{output_dir}
/assemblies/{sampleID}
/figures/{output_dir}
/assemblies/{sampleID}
/tables/
1. Error regarding path directories Check input and output directorys you've specified end with a '/'
2. Error: Nothing to be done Check config file and ensure you've changed the input/output directories.
3. A job crashed. What do I do? Two options, delete the output directory so snakemake can run everything again.
Or find out where it crashed and delete the whole folder/sample.
Example, sometimes IRMA produces errors, find the sample which crashed,
go to assemblies and delete the corresponding {sampleID} folder. Rerun snakemake.
4. I'm very confusd
or I need more help
or I've screwed something up badly! Shoot me an email
For any issues please submit a github issue.