TurakhiaLab / ROADIES

Tool for fully-automated inference of species trees from raw genome assemblies
https://turakhia.ucsd.edu/ROADIES/
MIT License
19 stars 2 forks source link

final roadies.nwk tree output is format out of order #9

Closed NullModel closed 11 months ago

NullModel commented 12 months ago

Ran this up on 4 primates: GCA_011078405.1 GCA_016695395.2 GCA_016700455.2 GCF_020740605.2

The result roadies.nwk tree is:

((GCF_020740605.2,(GCA_016695395.2,GCA_016700455.2)1.000000:2.302585),GCA_011078405.1);

This distance numbers should immediately follow their branch. For example, as seen in the genetrees/gene_tree_merged.nwk file:

((GCA_011078405.1_1:0.00340,GCA_011078405.1_0:0.01810):0.05127,(GCA_016700455.2_0:0.00000,GCA_016695395.2_0:0.00000):0.02681,GCF_020740605.2_0:0.14377);

Also, this would not work when the files were in fa.gz format, they needed to be plain text .fa files.

ang037 commented 12 months ago

In ROADIES, we use ASTRAL-Pro as the final stage for species tree estimation from gene trees. ASTRAL-Pro does not estimate terminal branch lengths. It only estimates internal branch lengths and those terminal branch lengths that correspond to species with more than one individuals sampled (reference - https://github.com/chaoszhang/A-pro/blob/master/ASTRAL-MP/astral-tutorial-template.md#branch-length-and-support ). Hence we do not see distance numbers immediately following the species name as you suggested.

You may want to use the script provided by ASTRAL for the user to generate arbitrary branch lengths to each branch (link to the script).

Also, the issue with the .fa.gz format has been fixed now.