USDA-VS / vSNP3

vSNP -- validate SNPs
GNU General Public License v3.0
4 stars 1 forks source link
analysis-pipeline bam-files brucella bwa-mem freebayes genomics high-throughput-sequencing influenza mycobacterium-bovis mycobacterium-bovis-af2122 mycobacterium-tuberculosis-complex phylogenetics pipeline sars-cov-2 snps tuberculosis vcf-files wgs whole-genome-sequencing

vSNP3

This is an introduction to vSNP version 3. A comprehensive overview of vSNP version 2 is here with version 2 GitHub repo here.

vSNP3 generates BAM, VCF and annotated SNP tables and corresponding phylogenetic trees to achieve a high resolution SNP analysis.

Whole genome sequencing for disease tracing and outbreak investigations is routinely required for high consequence diseases. vSNP is an accreditation-friendly and robust tool, designed for easy error correction and SNP validation... vSNP. vSNP generates annotated SNP tables and corresponding phylogenetic trees that can be scaled for reporting purposes. It is able to process large scale datasets, and can accommodate multiple references.

Conda installation

Tested with Python 3.8 - 3.9.

Anaconda setup

conda create -c conda-forge -c bioconda -n vsnp3 vsnp3=3.24

Installation test

which vsnp3_step1.py
vsnp3_step1.py -h
vsnp3_step2.py -h

Test workflow

Download test files

cd ~; git clone https://github.com/USDA-VS/vsnp3_test_dataset.git

Add reference:

cd ~/vsnp3_test_dataset/vsnp_dependencies
vsnp3_path_adder.py -d `pwd`

Run test with AF2122 (Mycobacterium bovis)

Test step 1:

cd ~/vsnp3_test_dataset/AF2122_test_files/step1
vsnp3_step1.py -r1 *_R1*.fastq.gz -r2 *_R2*.fastq.gz -t Mycobacterium_AF2122

Test step 2:

cd ~/vsnp3_test_dataset/AF2122_test_files/step2
vsnp3_step2.py -wd . -a -t Mycobacterium_AF2122

Run test with NC_045512 (SARS-CoV-2)

Test step 1:

cd ~/vsnp3_test_dataset/NC_045512_test_files/step1
for i in *.fastq.gz; do n=`echo $i | sed 's/[_.].*//'`; echo "$i in directory $n"; mkdir -p $n; mv $i $n/; done
cdir=`pwd`; for f in *; do echo $f; cd ./$f; vsnp3_step1.py -r1 *_R1*.fastq.gz -r2 *_R2*.fastq.gz -t NC_045512_wuhan-hu-1; cd $cdir; done
mkdir stats; cp **/*stats.xlsx stats; cd stats; vsnp3_excel_merge_files.py #see combined_excelworksheets-*.xlsx stat summary

Test step 2:

cd ~/vsnp3_test_dataset/NC_045512_test_files/step2
vsnp3_step2.py -a -t NC_045512_wuhan-hu-1 -remove

Description

See -h option for detail and usage

Step 1

Step 2

Step 1 and 2

Utility scripts

Input Summary

vSNP inputs

Map

Script structure

Version Enhancements

Additional Tools