anastasiiakim / PRANC

PRANC is used to compute probabilities of ranked phylogenetic gene trees given a species tree under coalescent process.
MIT License
7 stars 1 forks source link

PRANC

can be used to compute the probabilities of ranked or unranked phylogenetic gene tree topologies given a species tree under the coalescent process. A ranked tree depicts not only the topological relationship among gene lineages, as an unranked tree does, but also the sequence in which the lineages coalesce. PRANC can also output "democratic vote" (most frequent) ranked or unranked topologies. PRANC can estimate the maximum likelihood species tree with branch lengths from the sample of ranked or unranked gene tree topologies. Greedy consensus tree can be used as a starting tree. Also, trees selected by the minimization of ancient coalescence (MAC) criterion can be used as starting trees.

Installation

After downloading the source code, go to SRC directory and type

make

This will create an executable called pranc, which can be run from BIN with some input options listed below.

Usage

Program options:

Option Description Input files Output files
-rprob calculates probabilities of ranked gene tree topologies
  • species tree file
  • file containing ranked gene trees (with branch lengths)
  • file containing gene tree topologies (optional)
  • outRankGT.txt
-uprob calculates probabilities of unranked gene tree topologies
  • species tree file
  • file containing unranked gene trees (without branch lengths; branch lengths will be ignored if given)
  • outEachRankTopo.txt
  • outUnrGT.txt
-sym outputs symbolic probabilities of ranked gene tree topologies
  • species tree file
  • file containing ranked gene trees (with branch lengths)
  • outSymbolic.txt
  • outHistProbs.txt
-rtopo outputs ranked tree topologies and frequencies of the topologies
  • file containing ranked trees (with branch lengths specified)
  • outRankTopos.txt
  • outRankFreqs.txt
-utopo outputs unranked tree topologies and frequencies of the topologies
  • file containing unranked trees (without branch lengths; branch lengths will be ignored if given)
  • outUnrTopos.txt
  • outUnrFreqs.txt
-write outputs tree with ranks instead of branch lengths
  • file containing one ranked gene tree (with branch lengths)
  • outRankTree.txt
-rank_trees outputs all ranked topologies that share same unranked topology
  • file containing unranked gene trees (without branch lengths; branch lengths will be ignored if given)
  • outRankTopos.txt
-mac outputs species tree MAC score
  • species tree file
  • file containing ranked gene trees (with branch lengths)
  • outMacScore.txt
-cons outputs greedy consensus tree without branch lengths
  • file containing unranked gene trees (ranked trees will be treated as unranked treees)
  • outGreedyCons.txt
-like_nonni
    -rgt
calculates ML interval lengths of a given species tree topology
  • species tree file
  • file containing ranked gene trees (with branch lengths)
  • outNoNniMLTopo.txt
-like_nonni
    -ugt
calculates ML interval lengths of a given species tree topology
  • species tree file
  • file containing unranked gene trees (without branch lengths; branch lengths will be ignored if given)
  • outNoNniMLTopo.txt
-like_nni
    -rgt
estimates ML species tree given a starting tree (one or more). See the other options below.
  • starting species tree file
  • file containing ranked gene trees (with branch lengths)
  • outWithNniMLTopo.txt
-like_nni
    -ugt
(current version has a bug) estimates ML species tree given a starting tree (one or more). See the other options below.
  • starting species tree file
  • file containing unranked gene trees (without branch lengths; branch lengths will be ignored if given)
  • outWithNniMLTopo.txt

All input files should be in the Newick format. All trees are treated as rooted binary trees. We assume an ultrametric species tree (leaves of the tree are all equidistant from the root). The taxon names of gene trees should match the taxon names of the corresponding species tree. User can run PRANC as shown below.

./pranc -rprob <species-tree-file-name> <ranked-gene-tree-file-name> <gene-tree-topology-file-name>
./pranc -uprob <species-tree-file-name> <unranked-gene-tree-file-name>
./pranc -sym <species-tree-file-name> <ranked-gene-tree-file-name>
./pranc -rtopo <ranked-tree-file-name>
./pranc -write <ranked-tree-file-name>
./pranc -like_nonni <species-tree-file-name> -rgt <ranked-tree-file-name>
./pranc -like_nonni <species-tree-file-name> -rgt <ranked-tree-file-name> -lb 0.001 -ub 6 -tol 1e-10 -tiplen 0.1
./pranc -like_nonni <species-tree-file-name> -ugt <unranked-tree-file-name>
./pranc -like_nni <starting-species-tree-file-name> -rgt <ranked-tree-file-name>
./pranc -like_nni <starting-species-tree-file-name> -rgt <ranked-tree-file-name> -nni 5 -diff 0.1 -startsubset 3 -initsubset 3 -maxsubset 1  -lb 0.001 -ub 6 -tol 1e-10 -tiplen 0.1
./pranc -like_nni <starting-species-tree-file-name> -ugt <unranked-tree-file-name>
./pranc -like_nni_brent <species-tree-file-name> -rgt <ranked-tree-file-name>
./pranc -like_nni_brent <species-tree-file-name> -rgt <ranked-tree-file-name> -nni 5 -diff 0.1 -startsubset 3 -initsubset 3 -maxsubset 10 -rounds 5  -lb 0.001 -ub 6 -tol 1e-06 -eps 1e-06 -tiplen 0.1
./pranc -like_nni_brent <species-tree-file-name> -ugt <unranked-tree-file-name>

Examples

All input files used below can be found in the BIN folder.

Example 1 (-rprob)

./pranc -rprob st_5taxon.txt rgt_5taxon.txt gtopos_5taxon.txt

output:

Total: 0.146615

outRankGT.txt (probabilities and ranked topologies):

0.0687959   BE-2-ACD-3-CD-4-
0.0685643   ACD-2-BE-3-CD-4-
0.00925435  ACD-2-CD-3-BE-4-

Example 2 (-rprob)

./pranc -rprob st_5taxon.txt rgt_5taxon.txt

output:

Total: 0.146615

outRankGT.txt (probabilities):

0.0687959   
0.0685643   
0.00925435  

Example 3 (-uprob)

./pranc -uprob st_5taxon.txt unrgt_5taxon.txt 

output:

Total: 0.146615

outEachRankTopo.txt (probabilities and ranked topologies):

0.0687959   BE-2-ACD-3-CD-4-
0.0685643   ACD-2-BE-3-CD-4-
0.00925435  ACD-2-CD-3-BE-4-

outUnrGT.txt (unranked tree and probability):

((B,E),(A,(C,D)));  0.146615

Example 4 (-sym)

./pranc -sym st_5taxon.txt gt_5taxon.txt 

output:

Total: 0.0687959

outHistProbs.txt (ranked histories and probabilities):

1234    0.000118525
1233    7.12235e-08
...
1112    0.00373918
1111    0.000909714

outSymbolic.txt (first block shows the probability of the ranked history 1234, second block shows the probability of the ranked history 1233, etc.)

 + (exp(-0*(s1-s2))*1/(1) + exp(-1*(s1-s2))*1/(-1))  * 
(exp(-0*(s2-s3))*1/(1) + exp(-1*(s2-s3))*1/(-1))  * 
(exp(-0*(s3-s4))*1/(1) + exp(-1*(s3-s4))*1/(-1))  * 
2/2

 + (exp(-0*(s1-s2))*1/(1) + exp(-1*(s1-s2))*1/(-1))  * 
(exp(-0*(s2-s3))*1/(2) + exp(-1*(s2-s3))*1/(-1) + exp(-2*(s2-s3))*1/(2))  * 
(exp(-1*(s3-s4))*1/(1))  * 
2/2
...

Example 5 (-rtopo)

./pranc -rtopo 5taxa_trees.txt

output:

outRankTopos.txt:

t1|t2|t3|t4|-2-t1|t3|t4|-3-t1|t4|-4-
t1|t2|t5|-2-t3|t4|-3-t1|t2|-4-
t1|t2|t5|-2-t3|t4|-3-t1|t2|-4-
t2|t3|t4|t5|-2-t2|t5|-3-t3|t4|-4-
t2|t3|t4|t5|-2-t3|t4|-3-t2|t5|-4-

outRankFreqs.txt:

2   t1|t2|t5|-2-t3|t4|-3-t1|t2|-4-
1   t2|t3|t4|t5|-2-t3|t4|-3-t2|t5|-4-
1   t2|t3|t4|t5|-2-t2|t5|-3-t3|t4|-4-
1   t1|t2|t3|t4|-2-t1|t3|t4|-3-t1|t4|-4-

Example 6 (-utopo)

./pranc -utopo 5taxa_trees.txt

output:

outUnrTopos.txt:

t1|t4|-t1|t3|t4|-t1|t2|t3|t4|-t1|t2|t3|t4|t5|-
t1|t2|-t3|t4|-t1|t2|t5|-t1|t2|t3|t4|t5|-
t1|t2|-t3|t4|-t1|t2|t5|-t1|t2|t3|t4|t5|-
t2|t5|-t3|t4|-t2|t3|t4|t5|-t1|t2|t3|t4|t5|-
t2|t5|-t3|t4|-t2|t3|t4|t5|-t1|t2|t3|t4|t5|-

outUnrFreqs.txt:

2   t2|t5|-t3|t4|-t2|t3|t4|t5|-t1|t2|t3|t4|t5|-
2   t1|t2|-t3|t4|-t1|t2|t5|-t1|t2|t3|t4|t5|-
1   t1|t4|-t1|t3|t4|-t1|t2|t3|t4|-t1|t2|t3|t4|t5|-

Example 7 (-utopo)

./pranc -utopo unrgts.txt

output:

outUnrTopos.txt:

t3|t4|-t6|t7|-t1|t6|t7|-t3|t4|t5|-t1|t6|t7|t8|-t1|t2|t6|t7|t8|-t1|t2|t3|t4|t5|t6|t7|t8|-
t1|t2|-t3|t4|-t7|t8|-t1|t2|t6|-t5|t7|t8|-t3|t4|t5|t7|t8|-t1|t2|t3|t4|t5|t6|t7|t8|-
t1|t2|-t3|t4|-t7|t8|-t1|t2|t6|-t5|t7|t8|-t3|t4|t5|t7|t8|-t1|t2|t3|t4|t5|t6|t7|t8|-
t1|t2|-t3|t4|-t7|t8|-t1|t2|t6|-t5|t7|t8|-t3|t4|t5|t7|t8|-t1|t2|t3|t4|t5|t6|t7|t8|-
t1|t2|-t7|t8|-t1|t2|t3|-t6|t7|t8|-t1|t2|t3|t4|-t5|t6|t7|t8|-t1|t2|t3|t4|t5|t6|t7|t8|-
t1|t2|-t3|t4|-t5|t6|-t7|t8|-t5|t6|t7|t8|-t3|t4|t5|t6|t7|t8|-t1|t2|t3|t4|t5|t6|t7|t8|-
t3|t4|-t6|t7|-t1|t6|t7|-t3|t4|t5|-t1|t6|t7|t8|-t1|t2|t6|t7|t8|-t1|t2|t3|t4|t5|t6|t7|t8|-

outUnrFreqs:

3   t1|t2|-t3|t4|-t7|t8|-t1|t2|t6|-t5|t7|t8|-t3|t4|t5|t7|t8|-t1|t2|t3|t4|t5|t6|t7|t8|-
2   t3|t4|-t6|t7|-t1|t6|t7|-t3|t4|t5|-t1|t6|t7|t8|-t1|t2|t6|t7|t8|-t1|t2|t3|t4|t5|t6|t7|t8|-
1   t1|t2|-t7|t8|-t1|t2|t3|-t6|t7|t8|-t1|t2|t3|t4|-t5|t6|t7|t8|-t1|t2|t3|t4|t5|t6|t7|t8|-
1   t1|t2|-t3|t4|-t5|t6|-t7|t8|-t5|t6|t7|t8|-t3|t4|t5|t6|t7|t8|-t1|t2|t3|t4|t5|t6|t7|t8|-

Example 8 (-write)

./pranc -write st_5taxon.txt 

output:

outRankTree.txt:

((B:2,E:2):2,(A:3,(C:1,D:1):2):1);

Example 9 (-rank_trees)

./pranc -rank_trees unrgt_5taxon.txt

output:

outRankTopos.txt:

((B:3,E:3):1,(A:2,(C:1,D:1):1):2);
((B:2,E:2):2,(A:3,(C:1,D:1):2):1);
((B:1,E:1):3,(A:3,(C:2,D:2):1):1);

Example 10 (-mac)

./pranc -mac st_5taxon.txt rgt_5taxon.txt

output:

outMacScore.txt:

2

Example 11 (-cons)

./pranc -cons unrgts.txt

output:

outGreedyCons.txt:

((t6,(t1,t2)),((t3,t4),(t5,(t7,t8))));

Example 12 (-like_nonni -rgt)

./pranc -like_nonni st_5taxon.txt -rgt rgt_5taxon.txt

output:

Negative log-likelihood  = 10.0393
The time of the most recent clade is set to 0.1
Optimize branch lengths using L-BFGS method with tolerance 1e-10
Allow the branch length to be in the interval [0.001, 6]
Negative log-likelihood = 4.23458
mse: 5.82943
initial interval lengths
0.299983 0.001005 0.607481 
estimated interval lengths
6 1.1754 0.271628 
abs difference in interval lengths
5.70002 1.1744 0.335853 

outNoNniMLTopo.txt (your estimated branch lengths will be slightly different):

((B:0.375797,E:0.375797):7.173825,(A:1.549622,(C:0.100000,D:0.100000):1.449622):6.000000);

The user may change the default settings

./pranc -like_nonni st_5taxon.txt -rgt rgt_5taxon.txt -lb 0.01 -ub 5 -tol 1e-08 -tiplen 0.1

Example 12 (-like_nonni -ugt)

./pranc -like_nonni st_5taxon.txt -ugt ugt_5taxon.txt

output:

outNoNniMLTopo.txt (your estimated branch lengths will be slightly different):

((B:0.100100,E:0.100100):0.000200,(A:0.100200,(C:0.100000,D:0.100000):0.000200):0.000100);

Example 13 (-like_nni -rgt)

./pranc -like_nni st_5taxon.txt -rgt rgt_5taxon.txt

output:

The time of the most recent clade is set to 0.1
Optimize branch lengths using L-BFGS method with tolerance 1e-10
Allow the branch length to be in the interval [0.001, 6]
Maximum number of NNI moves: 5
Stop if the difference between log-likelihoods is greater than 0.1
The number of maximum rankings considered of each unranked species tree candidate (default): 2*(Number of Taxa)
The number of initial rankings considered of each unranked species tree candidate (default): Number of Taxa
Starting ranked species tree: ((B:1.488875,E:1.488875):0.300988,(A:1.489880,(C:0.881394,D:0.881394):0.608486):0.299983);
Negative log-likelihood  = 4.23458

outWithNniMLTopo.txt (your estimated branch lengths and topology might be slightly different):

((B:0.376472,E:0.376472):7.173572,(A:1.550044,(C:0.100000,D:0.100000):1.450044):6.000000);

The user may change the default settings

./pranc -like_nni st_5taxon.txt -rgt rgt_5taxon.txt -nni 3 -diff 0.1 -startsubset 1 -initsubset 2 -maxsubset 3  -lb 0.0001 -ub 3 -tol 1e-10 -tiplen 0.1

Example 14 (-like_nni -ugt)

./pranc -like_nni st_5taxon.txt -ugt ugt_5taxon.txt

output:

outWithNniMLTopo.txt (your estimated branch lengths and topology might be slightly different):

(((E:0.100000,D:0.100000):0.592201,A:0.692201):0.255117,(B:0.246954,C:0.246954):0.700365);

Example 15 (-like_nni_brent -rgt)

./pranc -like_nni_brent st_5taxon.txt -rgt rgt_5taxon.txt

output:

The time of the most recent clade is set to 0.1
Optimize branch lengths using Brent's method with epsilon 1e-06 and tolerance 1e-06
Allow the branch length to be in the interval [0.001, 6]
Maximum number of NNI moves: 5
Stop if the difference between log-likelihoods is greater than 0.1
The number of initial rankings considered of each unranked species tree candidate (default): Number of Taxa
The number of round optimizations (default): Number of Taxa
Negative log-likelihood: 4.23459

outWithNniMLTopo.txt (your estimated branch lengths and topology might be slightly different):

((B:0.375799,E:0.375799):7.173815,(A:1.549621,(C:0.100000,D:0.100000):1.449621):5.999993);

The user may change the default settings

./pranc -like_nni_brent st_5taxon.txt -rgt rgt_5taxon.txt nni 3 -diff 0.1 -startsubset 1 -maxsubset 3 -rounds 3  -lb 0.001 -ub 10 -tol 1e-10 -eps 1e-10 -tiplen 1