cristianzambrano / MO-Phylogenetics

MO-Phylogenetics is a optimization software tool that infers bi-objective phylogenetic trees optimized under two reconstruction criteria the Maximun Parsimony and the Maximun Likelihood simultaneously
http://khaos.uma.es/mophylogenetics/
0 stars 3 forks source link

MO-Phylogenetics - README

======================================================================================= TABLE OF CONTENTS

  1. Requirements
  2. Installing
  3. Executing
  4. Parameters
  5. Structure
  6. Results

=======================================================================================

  1. Requirements

MO-Phylogenetics has been developed in Unix machines (Ubuntu and MacOS X) as well as in Windows (MinGW) using the G++ (4.4.7) compiler. The make utility has been used to compile the software package.

MO-Phylogenetics is based on jMetalCpp and requires install two frameworks:

1) Bio++: a set of C++ libraries for Bioinformatics, including sequence analysis, phylogenetics, molecular evolution and population genetics. Home Page: http://biopp.univ-montp2.fr/

2) PLL - Phylogenetic Likelihood Library: a highly optimized, parallized software library to ease the development of new software tools dealing with phylogenetic inference. Home Page: Phylogenetic Likelihood Library http://www.libpll.org/

Bio++ also requeries the CMake utility and Pll requeries Autoreconf utility.

=======================================================================================

=======================================================================================

  1. Installing MO-Phylogenetics

Copy the compressed file to the location where you want to install MO-Phylogenetics and unzip it.

Then, run the install.sh with the following command: %sh install.sh

This install all the libraries needed: Bio++ (Bpp-Core, Bpp-Seq and Bpp-Phyl) and Pll. Add the path of the new libraries installed in your .bashrc file

For Bio++:

export CPATH=$CPATH:MOPhylogenetics_path/lib/Bpp/include
export LIBRARY_PATH=$LIBRARY_PATH:MOPhylogenetics_path/lib/Bpp/lib
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:MOPhylogenetics_path/lib/Bpp/lib

For Pll:

export CPATH=$CPATH:MOPhylogenetics_path/lib/Pll/src/include
export LIBRARY_PATH=$LIBRARY_PATH:MOPhylogenetics_path/lib/Pll/src/lib
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:MOPhylogenetics_path/lib/Pll/src/lib

=======================================================================================

=======================================================================================

  1. Executing MO-Phylogenetics

The main binary is in the subfolder 'bin'. Enter this folder to execute MO-Phylogenetics.

% cd bin % ./MOPhylogenetics param= parametersfile

The parameterfile is the filename wich has all the parameters requeried to execute the program. For example, to execute the program using one of our test problem 500 sequences dataset, type the following command:

%./MOPhylogenetics param=parameters/params500.txt

The parameters to solve the state-of-the-art problems are defined in some files saved into the paramaters folder, and also are included the datasets (nucleotide sequences), partition model file and the initial user trees perfomanced by a bootstrap techniques using PhyML and DNAPars software.

=======================================================================================

=======================================================================================

  1. Structure

** src: Source Code of MO-Phylogenetics based in the jMetalCpp framework. Some changes were added to adjut to the Phylogenetic problem.

** bin: Main Binary of the software

data ** sequences: Sequences file model: Partition Model file used by Pll *** inputusertrees: Input Phylogenetic Trees used as repository to generate initial population

** parameters: Parameters file wich contains all the parameters needed to customize the software.

=======================================================================================

  1. Parameters

Parameters of the Metaheuristic:

experimentid = Experiment ID, the results file are renamed using this ID DATAPATH = data folder name

algorithm = Name, now available: NSGAII,SMSEMOA,MOEAD,PAES populationsize = Population Size o maxevaluations = Number of the evaluations

offset = SMSEMOA parameter only

datadirectory = MOAED/D parameter only

bisections = Number of bisections, PAES parameter only archivesize = Size of the archive, PAES parameter only

intervalupdateparameters = Interval of evaluations to Optimize the Branch-length and parameters of the evolutionary model

printtrace = Prints trace of the time elapsed and score value performed during optimizing strategies
printbestscores = Prints best scores found during the algorithm execution

Genetic Operators

selection.method = binarytournament and randomselection (for SMSEMOA) are Available.

crossover.method = Only the PDG (Prune-Delete-Graph) recombination operator is available. crossover.probability = Probability to execute. crossover.offspringsize = Number of descendents

mutation.method = NNI, SPR and TRB Topological mutations are available mutation.probability = Probability to execute mutation.distributionindex = Distribution Index

----------------------------------------------------------------------------------------

Phylogenetic Parameters

----------------------------------------------------------------------------------------

alphabet = DNA, RNA or Protein are available. input.sequence.file = The sequence file to use (sequences must be aligned!). For example $(DATAPATH)/sequences/55.txt

input.sequence.format = The alignment format, for exampĺe Phylip(order=sequential, type=extended, split=spaces)

input.sequence.sites_to_use = Sites to use all, nogap or complete (=only resolved chars) input.sequence.max_gap_allowed = Specify a maximum amount of gaps: may be an absolute number or a percentage. 0%

partitionmodelfilepll = Model specifications file. For example: $(DATAPATH)/model/55.model

init.population = Method to generate or create the Initial phylogenetic trees: user, random or stepwise init.population.tree.file = FileName of Input User Trees, for example: $(DATAPATH)/inputusertrees/bootstrap1000_55

init.population.tree.format = Phylogenetic Trees Format: Newick

** Parameters of the Evolutionary Model ***

model = Evolutionary Model Name, for example GTR+G

model.kappa = Kappa value, HKY+G Parameter Only

Frequences model.frequences = user or empirical model.piA = PiA model.piC = PiC model.piG = PiG model.piT = PiT

GTR relative rate parameters model.AC = AC model.AG = AG model.AT = AT model.CG = CG model.CT = CT model.GT = GT, always set 1

** Distribution Gamma ***

rate_distribution = Gamma technique only rate_distribution.ncat = Number of categories, 4 by default rate_distribution.alpha = Alpha value rate_distribution.beta = Beta value

** Phylogenetic Optimization ***

High-level Strategies for searching the tree space: h2 y h1

H1: This strategy is based on the theoretical perspective of the strong relation between parsimony and likelihood (minimizing the parsimony score is equivalent to maximizing the likelihood under some assumptions), and searchs bi-objective phylogenetic trees applying topological moves using the Parametric Progressive Neighbourhood (PPN) technique (Go¨effon et al.,2008) to minimizing the parsimony and with this maximizing the likelihood simultaneously; furthermore, with the aim of improve the likelihood, optimize all branch lengths affected for the topological moves.

H2: This strategy is a parametric combination of two techniques that optimize both criteria separately, for parsimony uses PPN technique applying topological moves and adjusting the branch lengths affected, and for the likelihood uses a topological rearrange method provided by the Phylogenetic Likelihood Library (PLL).

Parameters:

optimization.method = h2 or h2 optimization.method.perc = Percentage to apply pllRearrangeSearch or PPN technique

optimization.ppn.numiterations = Number of iterations of PPN technique optimization.ppn.maxsprbestmoves = Limit of best moves to be applied in PPN Technique

Parameters of the pllRearrangeSearch function:

optimization.pll.percnodes = % of nodes to be used has root node in pllRearrangeSearch optimization.pll.mintranv = Define the annulus in which the topological moves should take place. optimization.pll.maxtranv = Define the annulus in which the topological moves should take place. optimization.pll.newton3sprbranch = Optimization of branch-length affected by topological moves

** Branch Length Optimization ***

Optimization of the Branch-Length of the initial population of phylogenetic trees bl_optimization.starting = true or false bl_optimization.starting.method = Method: newton and gradient are available bl_optimization.starting.maxitevaluation = 1000 bl_optimization.starting.tolerance = Precision 0.001

Optimization of the Branch-Length during one of the techniques of tree-space exploration or during each Update Interval. bl_optimization = true or false bl_optimization.method = Method: newton and gradient are available bl_optimization.maxitevaluation = 1000 bl_optimization.tolerance = 0.001

Optimization of the Branch-Length of the final population of phylogenetic trees perfomanced by algorithm bl_optimization.final = true or false bl_optimization.final.method = Method: newton and gradient are available bl_optimization.final.maxitevaluation = 1000 bl_optimization.final.tolerance = 0.001

** Evolutionary Model Optimization ***

Optimization of the Parameters of the Evolutionary Model in model_optimization = true or false model_optimization.method = Method. brent or bfgs are available model_optimization.maxitevaluation = 1000 model_optimization.tolerance = 0.001

** Distribution Rate Optimization ***

ratedistribution_optimization = false ratedistribution_optimization.method = bfgs #brent or bfgs

=======================================================================================

  1. Results

The results of MO-Phylogenetics are based in the jMetalCpp output format, creates two files called FUN+ExperimentID and VAR+ExperimentID. The FUN file contains the data of the pareto front approximation and the VAR file contains the optimized phylogenetic trees in newick format.

=======================================================================================

7.- News

26-01-2016 Copyright 2017 is updated

26-01-2016 MOCHC Algorithm has been added Paramater bootstrapSize,initialconvergencecount,preservedPopulation and convergencevalue has been added

=======================================================================================