MGI-tech-bioinformatics / SARS-CoV-2_Multi-PCR_v1.0

SARS-CoV-2 analysis pipeline for multiplex-PCR MPS(Massive Parrallel Sequencing) data
19 stars 10 forks source link

MGI-SARS-CoV-2

SARS-CoV-2 analysis pipeline for multiplex-PCR MPS(Massive Parrallel Sequencing) data.

Introduction

This pipeline could accurately and efficiently identify SARS-CoV-2 reads from multiplex PCR sequencing data, and report the infection status of sequencing samples with positive/negative/uncertain label. The pipeline could also get the variant information such as SNP/INDEL and generate the consensus sequence.

Image

Updates:

May 11, 2020

  1. Adjust the min depth threshold of freebayes from 100 to 30
  2. Updated Cut_Multi_Primer.py to save more memory
  3. Fixed some errors in the HTML report
  4. Fixed a bug in consensus fasta

May 26, 2020

  1. Added 'SOAPnuke_param' in the json file,users users can now customize the parameters of SOAPnuke.
  2. Fixed a bug in consensus fasta,which cause an error during the generation of consensus sequence when there is an INDEL in vcf file.
  3. Fixed a bug in generate_rem_report.py,which caused Identification.txt to display abnormally in the HTML report.

Jun 2, 2020

  1. Fixed a bug in Windows.Depth.svg

Jun 8, 2020

  1. Prepared install.sh, users can install required software by running this script.
  2. Optimized Cut_Multi_Primer.py, this script only keeps virus reads now, which makes the primer cut step to run more efficiently.

Jun 24, 2020

  1. Fixed a bug in step6, we use zcat to read .gz files now.

Jul 14, 2020

  1. Fixed a bug in Cut_Multi_Primer.py
  2. One sample corresponds to multiple barcodes, pipeline will merge the fastq files.
  3. Add some statistical result in Identification.txt

Aug 21, 2020

  1. Apply variant annotation by snpEff

Feb 24, 2021

  1. Use bwa-mem instead of bwa-aln in alignment
  2. Update SARS-CoV-2 positive criteria: SARS-CoV-2 reads pct >= 0.1% AND (>= 1X Coverage ) >= 1%
  3. Update Freebayes version: v1.3.4

May 7, 2021

  1. Use variant annotation excel instead of VCF file in HTML report
  2. Optimized depth distribution SVG in HTML report.
  3. Mark the primer base quality as 0 instead of removing primer sequence
  4. Update primer sequence information
  5. Reduce software running time
  6. Upload a docker version of this software

Dec 8, 2021

  1. Fixed a bug in indel calling, we will merge overlaped PE reads to make the indel detection more accurate.

Jan 24, 2022

  1. Use SOAPnuke version 2.1.7 instead of 1.5.6.
  2. The summary of QC/Identification/Mutation/ConsensusFasta will be output in the directory $workdir/result/summary.
  3. Optimized memory usage of this software.
  4. Update the docker version to v1.3.

Requirements:

Before running this pipeline, you need to make sure that several pieces of software and/or modules are installed on the system:

Perl: >=v5.22.0
Python: >=v3.4.3
R: >=v3.3.2

Library for Python3 and R:

Softwares for data quality control:

Software for alignment and bam file statistics:

Software for variant calling:

Other required softwares:

Installation

To clone the repository:

git clone https://github.com/MGI-tech-bioinformatics/SARS-CoV-2_Multi-PCR_v1.0.git

To install the required software:

cd SARS-CoV-2_Multi-PCR_v1.0; sh install.sh

Notes:

Usage

1.Prepare the input.json file

The details for input.json file are as follows:

2.Run the pipeline.

python3 Main_SARS-CoV-2.py -i input.json 
cd path/to/workdir
nohup sh main.sh &

3.Analysis result.

1.Quality control result

path/to/workdir/result/*/05.Stat/QC.xlsx

2.Identification result

path/to/workdir/result/*/05.Stat/Identification.xlsx

3.Variant calling result

path/to/workdir/result/*/05.Stat/*.vcf.gz
path/to/workdir/result/*/05.Stat/*.snpEff.anno.xlsx

4.HTML report

path/to/workdir/result/*/05.Stat/*.html

With Docker

To pull a docker repository:

docker pull meizhiying/mgi-sars-cov-2:v1.3

Running

docker run -d \
--name $WORKNAME \
-v $workdir:$workdir \
-v $datadir:$datadir \
meizhiying/mgi-sars-cov-2:v1.3 \
/SARS-CoV-2_pipeline/bin/python3/bin/python3 /SARS-CoV-2_pipeline/bin/Main_SARS-CoV-2.py -i $json

Notes

  1. All requirements and software are install in docker image, there is no need to configure the software path in json file.

    Json Demo:

    {
      "FqType": "PE", 
      "sample_list": "/data/test/sample.list", 
      "workdir": "/data/test/analysis", 
      "SplitData": "1M", 
      "freebayes_param": "-H -p 1 -q 20 -m 60 --min-coverage 20 -F 0.6", 
      "consensus_depth": "10", 
      "primer_version": "2.0" 
    }
  2. $workdir is defined in json with "workdir", $datadir is defined in sample.list