andrewbigelow / mVarScan

" "
0 stars 0 forks source link

mVarScan

This project is for CSE 185. It implements a subset of mpileup2snp and finds SNPs within a given aligned genome in the form of a mpileup file. See VarScan for more details

REQUIREMENTS | INSTALLATION | BASIC USAGE | OPTIONAL | File formats

REQUIREMENTS:

These packages can be installed with pip and brew:

brew install mpich
pip install scipy.stats

Note: If you do not have root access, you can run the command above with additional options to install locally:

pip install --user scipy.stats

INSTALLATION:

pip install mVarScan

Note: If you run into an error: externally-managed-environment while pip installing mVarScan, you can create a virtual python environment and install and use mVarScan there using the following commands:

python -m venv ~/myenv # create a new python env
source ~/myenv/bin/activate # activate myenv
pip install mVarScan 
python -m mVarScan -h # test mVarScan installation

and when you're done using it:

deactivate
rm -rf ~/myenv # delete myenv

BASIC USAGE:

The basic usage of mVarScan is:

python mVarScan [options] [mpileup]

OPTIONAL:

-o --out FILENAME (file to output contents to)
-t --tab (1 for yes) (output using TAB formatting, default: 0)
-m --min-var-frequency FREQUENCY (minimum frequency to call a non-reference mutation, default: 0.2)
-h --min-freq-for-hom FREQUENCY (minimum frequency to call a non-reference homozygous mutation, default: 0.8)
-p --pvalue FLOAT (p-value threshold to output SNP, default: 0.99)
-r2 --min-reads2 INT (minimum supporting reads at a position to call variants, default: 2)
-c --min-coverage INT (Minimum read depth at a position to make a call. Default 8)
-q --min-avg-qual INT (minimum average base quality at a position to count a read, default: 15)

File Formats:

Input Files

mpileup

A mpileup file is a tab-delimited text file with no header, traditionally generated by samtools mpileup. It contains 6 columns:

chromosome    position    reference_base    coverage    read_bases    read_qualities    [optional extra columns]

Example:

chr6    128405804   T   22  ......................  DE:EFFImEJIJJIJ>JJIJHF

Output Files

tab

A tab file is a tab-delimited text file that is a modified VCF format. It includes similar columns, but differs in what it displays. Below is the header used:

#CHROM    tPOS    REF    ALT    SAMPLE    [other samples]

Example:

chr6    128414945   c   T   1/1:44,44:38.63636363636363:1.0:7.619481455868034e-26

regular output

The information about the snps above are printed in clearly labeled sections in the terminal. As seen below:

Chromosome:position | Sample # | homozygous_status | ref_base -> variant_base | frequency | p-value | reads, coverage | average base quality |

Example:

chr6:128414945 | Sample 1 | 1/1 | c -> T | frequency 1.00 | p-value 7.619481455868034e-26 | reads 44,44 | avg base quality 38.63636363636363|

NOTES:

Contributors:

This repository was generated by Andrew Bigelow, Aditya Parmar, and Numaan Formoli.