LaurieLecomte / SVs_long_reads

SV calling pipeline from ONT data
2 stars 0 forks source link

SV calling pipeline from long-read sequencing data

Pipeline Overview

  1. Call SVs : the 3 tools may be used independently in any order or at the same time.
    • 1.1. Sniffles : scripts 01.1 to 01.4
    • 2.2. SVIM : scripts 02.1 to 02.4
    • 3.2. NanoVar : scripts 03.1 to 03.4
  2. Merge SV calls across callers : 04_merge_callers.sh
  3. Format merged output : 05_format_merged.sh
  4. Filter merged output : 06_filter_merged.sh

Additional scripts

Other scripts targeting a specific step or operation conducted in one of the main scripts or allowing additional analyses are provided in the 01_scripts/utils subdirectory.

Older scripts used for development or debugging purposes are stored in the 01_scripts/archive folder for future reference if needed. These are not meant to be used in their current state and may be obsolete.

Prerequisites

Files

Software

For Manitou users

Custom conda environments are required for running NanoVar, SVIM, sniffles2 and jasmine, as these programs are not available on Manitou; See the Conda environment preparation section below.

For users working with other computing clusters and servers

The program versions specified in this pipeline refer to the versions available on IBIS' bioinformatics servers when this pipeline was built in 2021-2022, and are likely not available on all other servers. Please add a '#' at the beginning of each line in the #LOAD REQUIRED MODULES section in each script (or remove these lines), and follow the Conda environment preparation to create custom conda environments with correct program versions and dependencies. A R installation is also required.

Detailed Walkthrough

For running each script, copy the srun command from the script's header to the terminal and adjust parameters (memory, partition, time limit) if necessary.
The header also features a brief description of the script's contents.

Conda environment preparation

SV calling environments (SVs_LR + NanoVar)

From the main directory, run conda create --name SVs_LR --file SVs_LR_env.txt and conda create --name NanoVar --file NanoVar_env.txt

These environments are used for calling SVs and contain the following callers:

SV merging environment (jasmine_1.1.5)

From the main directory, run conda create --name jasmine_1.1.5 --file jasmine_1.1.5_env.txt

This environment is used for merging SVs across callers, and contains jasmine 1.1.5 and bcftools 1.13.

Main pipeline

1. Prepare region files (00_prepare_regions.sh)

This script prepares the bed files required for specifying the regions in which SVs must be called or must not be called. It first produces a bed file from the reference fasta in order to yield :

2. Call SVs using 3 seperate tools

Sniffles (scripts 01.1 to 01.4)

Before running each script for Sniffles, activate the SVs_LR env: conda activate SVs_LR

SVIM (scripts 02.1 to 02.4)

Before running each script for SVIM, activate the SVs_LR env: conda activate SVs_LR

NanoVar (scripts 03.1 to 03.4)

Before running each script for NanoVar, activate the NanoVar env: conda activate NanoVar

3. Merge SV calls across callers (04_merge_callers.sh)

Before running this script, activate the jasmine_1.1.5 env (even if you are working on Manitou): conda activate jasmine_1.1.5

4. Format merged output (05_format_merged.sh)

5. Filter merged SVs (06_filter_merged.sh)

Keep SVs supported by at least 2/3 tools and larger than 50 bp.