butzist / ProGraphMSA

Multiple sequence alignment of aa/codon sequences with tandem repeats
GNU General Public License v3.0
0 stars 1 forks source link

| \ __ / | _ | | | \/ / | / \ | | \ | |) | '/ | | | '/ ` | ' | ' | |\/| \ \ / \ | || | | |) | | /| | | () | || | | | (| | |) | | | | | | |__) / _ | | | < || || _/ _|| \,_| ./|| ||| ||___// _|| || || _\ || Fast and Robust Phylogeny-Aware Multiple Sequence Alignment

PLEASE NOTE: this repository is ARCHIVED and UNMAINTAINED. All further development is done at:

https://github.com/acg-team/ProGraphMSA


| _ () _ | / || | ' | ' | | ' \/ ` | ||\,||||||||||_, | |_/


| _ _ / | _| | | \/ / | /\ | / '/ \ ( | '/ ` | ' \ ' | |\/| _ \/ \ || || _/_|| _,_| ._/|||| |_|__// _\ |_|

The easiest way to run ProGraphMSA with the recommended command line parameters is to use the wrapper script "ProGraphMSA+TR.sh" included in this package. Just run

./ProGraphMSA+TR.sh

or

./ProGraphMSA+TR.sh -o

ProGraphMSA+TR will run using ML distances, the WAG model, and will output an alignment in FASTA format. Further it will use T-REKS from the file T-Reks.jar to detect tandem repeats. To use TRUST adjust the installation path in trust2treks.py and run

./ProGraphMSA+TR.sh --custom_tr_cmd trust2treks.py


/ |__ | | | () | (/ \ ' | ' \/ | ' \/ _ | | | | ' \/ -) \_/|||||_,|||_,| |||||_|

                           _

| | __ | ' \/ | '_/ _ | ' \/ -) / -) '(-< | ./\,|| _,|||_|_\|_| /_/ ||

USAGE:

./ProGraphMSA [--ancestral_seqs] [--all_trees] [-i ] [-T] [-I] [-M] [-m] [-a] [-C ] [-F] [--custom_model ] [-w] [-c ] [-r] [--custom_tr_cmd ] [--trd_output

] [--read_repeats ] [-R] ... [--repalign] [--repeat_indel_ext ] [--repeat_indel_rate ] [-A] [-P ] [-p ] [-D ] [-d ] [-x ] [-s ] [-l ] [-E ] [-e ] [-g ] [-f] [--dna] [--codon] [--topology ] [-t ] [-o ] [--] [--version] [-h] Tandem-repeat related parameters: ================================= -R, --repeats use T-REKS to identify tandem repeats --custom_tr_cmd custom command for detecting tandem-repeats --trd_output write TR detector output to file --read_repeats read TR detector output from file --repalign re-align detected tandem repeat units --repeat_indel_ext repeat indel extension probability --repeat_indel_rate insertion/deletion rate for repeat units (per site) Guide tree, distances, and substitution model: ============================================== -i , --iterations number of iterations re-estimating guide tree [default: 2] -m, --mldist use distances estimated by a Maximum-Likelihood method -a, --nwdist estimate initial distance tree from Needleman-Wunsch alignments -D , --max_dist maximum distance for alignment -F, --estimate_aafreqs estimate equilibrium amino acid frequencies from input data -w, --darwin use model of evolution from Darwin (GONNET matrix and different indel model parameters, otherwise WAG will be used) --custom_model custom substitution model in qmat format -c , --cs_profile path to library of context-sensitive profiles (we distribute a copy in the 3rd_party folder) -A, --no_force_align_m do not force alignment of initial Methionine Parameters for adjusting the indel model: ========================================= -l , --edge_halflife edge half-life (evolutionary distance at which the probability of re-using an unsused graph is halved) -E , --end_indel_prob probability of mismatching sequence ends (set to -1 to disable this feature) -e , --gap_ext gap extension probability -g , --indel_rate insertion/deletion rate Input/Output: ============= -f, --fasta output fasta format (instead of stockholm) -t , --tree initial guide tree -o , --output Output file name -I, --input_order output sequences in input order (default: tree order) --dna align DNA sequence --codon align DNA sequence based on a codon model --ancestral_seqs output all ancestral sequences (required) input sequences ___ _ _ _ _ __ | _ )_ _(_) |__| (_)_ _ __ _ / _|_ _ ___ _ __ ___ ___ _ _ _ _ __ ___ | _ \ || | | / _` | | ' \/ _` | | _| '_/ _ \ ' \ (_-=2.8 (http://www.cmake.org) tclap >=1.1.0 (http://tclap.sourceforge.net) Eigen 2.0.x or 3.0.x (http://eigen.tuxfamily.org) on Debian/Ubuntu you can install these programs/libraries with: sudo apt-get install cmake libtclap-dev libeigen2-dev Then perform the following command to configure/build/install ProGraphMSA: cd BUILD ccmake .. (press "c" to configure and "g" to generate the Makefile, see below for additional configuration options) make ProGraphMSA make install Additional CMake configuration options (in "ccmake .."): EIGEN2_INCLUDE_DIR: set this to the path, where Eigen is installed, if you use Eigen 3.0.x or if Eigen has been installed at a non-default location (default location: /usr/include/eigen2) WITH_EIGEN3: set this to ON, if you want to compile ProGraphMSA with Eigen 3.0.x CMAKE_CXX_FLAGS: add options for the C++ compiler, like optimization flags, or additional search paths for include files (-I ) WITH_SSE: disable this option, if you build ProGraphMSA for a machine that does not support SSE2