dpchris / MLVA

2 stars 0 forks source link

MLVA_finder.py README

Dependencies

python3 python-dev (sudo apt-get install python-dev), regex (https://launchpad.net/ubuntu/+archive/primary/+files/python-regex_0.1.20170117.orig.tar.gz)

goal

MLVA_finder.py is a python script designed to do Multi loci VNTR analysis (VNTR stands for Variable Number of Tandem Repeats ). MLVA_finder.py performs an in silico PCR to extract sequences of tandem repeat from submitted fasta file(s)
and call VNTR alleles. This script use a list of primers to recover sequences from the VNTRs. The number of allowed mismatches can be set in the command line.

input

-i or --input takes the path of the directory containing all fasta file which will be used by the script -o or --output takes the path of the results directory where results will be saved -p or --primer takes the path of the primers list in a csv file (delimited by '\t', ";" or ",") [options] -m or --mismatch takes the number of mismatches allowed for each primer (default = 2) -c or --contig : necessary if fasta files contain contigs -r or --round : rounds MLVA allele values, take float, default is 0.25 (meaning that an allele value of an integer +- 0.25 will be rounded to this integer; larger deviations from this value will be called as "intermediate size alleles" i.e. all alleles between 1.26 and 1.74 will be called 1.5) -b or --binning : corrects MLVA allele calls for primers present in binning_file.csv (default binning_file contains Brucella MLVA assay corrections as indicated in published allele numbering system version 3.6 (http://mlva.u-psud.fr/brucella/spip.php?article93) Rules are defined by adding primers with insert_size and number of pattern repetitions in binning_file.csv. Binning file must be located in the same directory as MLVA_finder.py) --mixte : a fasta file with a single sequence will be considered as "chromosome" and fasta files with multiple sequences as contigs (insert size for "chromosome" is calculated for circular chromosome) --full-locus-name : header will be full locus name instead of reduced locus name --predicted-PCR-size-table : output a supplementary table with all predicted PCR sizes --flanking-seq : add flanking column in , flanking are the sequences before and after the insert (primers inculded), you can chose the size of flanking sequences

primers format : primers name must be written as shown below in primers_file :

_bp_bp_U forward_primer reverse_primer example : Lp03_96bp_941bp_8U CAACCAATGAAGCAAAAGCA AGGGGTTGATGGTCTCAATG (mind that indicated insert size contains both primers) ### output ### MLVA_analysis_xxx.csv : csv file containing all MLVA values for each fasta file and all loci from the MLVA analysis this csv file is designed to be uploaded on http://microbesgenotyping.i2bc.paris-saclay.fr output file : csv file containing All informations from the MLVA analysis such as primers positions of match(s), size of insert, number of mismatches etc mismatch file : txt file containing all different mismatches for each locus (only locus with mismatches) found during MLVA analysis on input fasta sequence predicted-pcr-size-table.csv : optional csv file containing predicted PCR size ### command line examples ### python3.6 [/path/to]/MLVA/MLVA_finder.py -i data_test/assemblies/Legionella_pneumophila/ -o . -p data_test/primers/Legionella_pneumophila_primers.txt python3.6 [/path/to]/MLVA/MLVA_finder.py --input data_test/assemblies/Legionella_pneumophila/ --output [/path/to]/results/ --primer data_test/primers/Legionella_pneumophila_primers.txt (equivalent to the previous one) #with different number of mismatch allowed : python3.6 [/path/to]/MLVA/MLVA_finder.py -i data_test/assemblies/Legionella_pneumophila/ -o . -p data_test/primers/Legionella_pneumophila_primers.txt -m 1 #1 mismatch allowed python3.6 [/path/to]/MLVA/MLVA_finder.py -i data_test/assemblies/Legionella_pneumophila/ -o . -p data_test/primers/Legionella_pneumophila_primers.txt -m 4 #4 mismatches allowed #If fasta files contains contigs (and only contigs) : python3.6 [/path/to]/MLVA/MLVA_finder.py -i data_test/assemblies/Legionella_pneumophila/ -o . -p data_test/primers/Legionella_pneumophila_primers.txt -c python3.6 [/path/to]/MLVA/MLVA_finder.py -i data_test/assemblies/Legionella_pneumophila/ -o . -p data_test/primers/Legionella_pneumophila_primers.txt -c -m 0 #0 mismatches allowed #Using binning option : python3.6 [/path/to]/MLVA/MLVA_finder.py -i data_test/assemblies/Brucella -p data_test/primers/Brucella_primers.txt -o . -b data_test/Brucella_binning_file.csv python3.6 [/path/to]/MLVA/MLVA_finder.py -i data_test/assemblies/Brucella -p data_test/primers/Brucella_primers.txt -o . -b data_test/Brucella_binning_file.csv --predicted-PCR-size-table --flanking-seq 200 #written by D.Christiany