etetoolkit / ete

Python package for building, comparing, annotating, manipulating and visualising trees. It provides a comprehensive API and a collection of command line tools, including utilities to work with the NCBI taxonomy tree.
http://etetoolkit.org
GNU General Public License v3.0
768 stars 216 forks source link

ete evol: Inconsistent marking between --mark and --internal options #747

Open lab83bio opened 2 months ago

lab83bio commented 2 months ago

Hi,

I am using ete3 evol (conda inst. V. 3.1.2) to analyse pseudogenes in primate genomes. According to the documentation if a tree is marked with --mark and two species separated by three commas

the tree is marked from the common ancestor of the surrounding species

Consistently, using the tree ECP_EDN_15.nw of the documentation page with --mark Chimp_EDN,,,Human_EDN, the tree is marked as follows:

$ ete3 evol -t ECP_EDN_15.nw --alg ECP_EDN_15.fasta --model b_free --mark Chimp_EDN,,,Human_EDN -o outdir/ 
...
       marking branches 3, 7, 21

          (((Hylobates_EDN,(Orang_EDN,(Gorilla_EDN,(Chimp_EDN #1,Human_EDN #1) #1))),(Macaq_EDN,(Cercopith_EDN,(Macaq2_EDN,Papio_EDN)))),(Orang_ECP,((Macaq_ECP,Macaq2_ECP),(Goril_ECP,Chimp_ECP,Human_ECP))));

One should expect that this marked tree is also present among the trees generated by the --internal option. However, by using this option, the program generate a tree marked at descendents but not at the common ancestor.

$ ete3 evol -t ECP_EDN_15.nw --alg ECP_EDN_15.fasta --model b_free --internal -o outdir2/
...
       marking branches 3, 7

          (((Hylobates_EDN,(Orang_EDN,(Gorilla_EDN,(Chimp_EDN #1,Human_EDN #1)))),(Macaq_EDN,(Cercopith_EDN,(Macaq2_EDN,Papio_EDN)))),(Orang_ECP,((Macaq_ECP,Macaq2_ECP),(Goril_ECP,Chimp_ECP,Human_ECP))));

I think that the marking by --mark is correct, at variance with the --internal one, because if descendent branches (3, 7) are marked, also the ancestral branch (21) should be marked.