caozhichongchong / QuickVariants

Fast and Accurate Variant Identification Tool for Sequencing-Based Studies
6 stars 0 forks source link

QuickVariants

Code for paper "Fast and Accurate Variant Identification Tool for Sequencing-Based Studies"

QuickVariants is a fast and accurate variant identification tool, designed to summarize allele information from read alignments without discarding or filtering the data.

Install

Requirement: Java \ Please download the latest QuickVariants here

You may install java by conda install conda-forge::openjdk\ You may install java and QuickVariants by conda install caozhichongchong::quick-variants\ QuickVariants can be found at $Conda_env_location/bin/quick-variants-VERSION.jar

Usage

java -Xms10g -Xmx10g -jar quick-variants-VERSION.jar [--out-vcf <out.vcf>] [--out-mutations <out.txt>] --reference <ref.fasta> --in-sam <input.sam> --num-threads num_threads [options]

This command converts a SAM file to other formats, most notably .vcf.

Input

Output formats\ Summary by reference position, mutation, genome, and raw output are possible.

Summary by reference position

Summary by mutation

Summary by genome

Raw output

Debugging

Multiple output formats may be specified during a single run; for example:

Others

Test

java -jar quick-variants-1.1.0.jar --out-vcf Fig4Example1.vcf --reference examples/Fig4/reference.fasta --in-sam examples/Fig4/Example1/90.sam
java -jar quick-variants-1.1.0.jar --out-vcf Fig4Example2.vcf --reference examples/Fig4/reference.fasta --in-sam examples/Fig4/Example2/86.sam
java -jar quick-variants-1.1.0.jar --out-vcf Fig5Example1.vcf --reference examples/Fig5/reference.fasta --in-sam examples/Fig5/Example1/99.sam
java -jar quick-variants-1.1.0.jar --out-vcf Fig5Example2.vcf --reference examples/Fig5/reference.fasta --in-sam examples/Fig5/Example2/102.sam
java -jar quick-variants-1.1.0.jar --out-vcf Fig5Example3.vcf --reference examples/Fig5/reference.fasta --in-sam examples/Fig5/Example3/9.sam
java -jar quick-variants-1.1.0.jar --out-vcf Fig5Example4.vcf --reference examples/Fig5/reference.fasta --in-sam examples/Fig5/Example4/47.sam
java -jar quick-variants-1.1.0.jar --out-vcf Fig6Example1.vcf --reference examples/Fig6/reference.fasta --in-sam examples/Fig6/Example1/88.sam
java -jar quick-variants-1.1.0.jar --out-vcf Fig6Example2.vcf --reference examples/Fig6/reference.fasta --in-sam examples/Fig6/Example2/90.sam
java -jar quick-variants-1.1.0.jar --out-vcf Fig6Example3.vcf --reference examples/Fig6/reference.fasta --in-sam examples/Fig6/Example3/9.sam

Additional scripts and models

The benchmark_scripts folder contains code used to construct the benchmark dataset, filter SNPs and indels in VCF files, and analyze VCF files.

Analyzing benchmark datasets used in this study\ Please download benchmark datasets here\ Requirements: bowtie2, bwa, minimap2, samtools, bcftools, python3, jupyter notebook\ -Gut microbiome WGS data with in silico mutations

python SNP_model_covid.py -i Gut_microbiome_benchmark/original_data -o Gut_microbiome_benchmark/
python SNPfilter.py -i Gut_microbiome_benchmark/
python Indelfilter.py -i Gut_microbiome_benchmark/
python SNP_model_compare.py -i Gut_microbiome_benchmark/

Point mutations detected: Gut_microbiome_benchmark/SNP_model/merge/final.txt and Gut_microbiome_benchmark/SNP_model/merge/model.sum.txt\ Indels detected: Gut_microbiome_benchmark/SNP_model/merge/indel.vcf.filtered and Gut_microbiome_benchmark/SNP_model/merge/modelindelsum.txt

-SARS-COV-2 WGS data with in silico mutations

python SNP_model_covid.py -i COVID_benchmark/original_data -fa .fasta -fq _1.fastq -o COVID_benchmark/
python SNPfilter.py -i COVID_benchmark/
python Indelfilter.py -i COVID_benchmark/
python SNP_model_compare.py -i COVID_benchmark/

-WGS data simulated with sequencing errors

python SNP_model_covid.py -i WGS_simulation_sequencingerror/original_data -fa .fasta -fq _1.fq -o WGS_simulation_sequencingerror/
python SNPfilter.py -i WGS_simulation_sequencingerror/
python Indelfilter.py -i WGS_simulation_sequencingerror/
python SNP_model_compare.py -i WGS_simulation_sequencingerror/

-MG data simulated with sequencing errors (20X)

python SNP_model_covid.py -i MG_simulation_sequencingerror/original_data -fa .fasta -fq _1.fq -o MG_simulation_sequencingerror/
python SNPfilter.py -i MG_simulation_sequencingerror/
python Indelfilter.py -i MG_simulation_sequencingerror/
python SNP_model_compare.py -i MG_simulation_sequencingerror/

-MG data simulated with sequencing errors (100X)

python SNP_model_covid.py -i MGBIG_simulation_sequencingerror/original_data -fa .fasta -fq _1.fq -o MGBIG_simulation_sequencingerror/
python SNPfilter.py -i MGBIG_simulation_sequencingerror/
python Indelfilter.py -i MGBIG_simulation_sequencingerror/
python SNP_model_compare.py -i MGBIG_simulation_sequencingerror/

-SARS-COV-2 sewage MG real data

python SNP_model_covid.py -i COVID_MGSW/original_data -fa .fasta -fq _1.fq -o COVID_MGSW/
python SNPfilter.py -i COVID_MGSW/
python Indelfilter.py -i COVID_MGSW/
python SNP_model_compare.py -i COVID_MGSW/