FePhyFoFum / phyx

phylogenetics tools for linux (and other mostly posix compliant) computers
blackrim.org
GNU General Public License v3.0
111 stars 17 forks source link

pxlstr reports binary:false but I can't find the polytomy(/ies)! #139

Closed HeatherKates closed 3 years ago

HeatherKates commented 3 years ago

Hello,

I've run into this problem many times that trees that I have modified (with phyx and manually) and am trying to read back into raxml-ng trip up the tree reader in that program. I've been using pxlstr to try to identify the problem (the tree needs to be fully bifurcating and I'm guessing that may be the issue) on a very large tree. My tree is attached. Pxlstr reports that this tree is not binary (output below), but I can't find the polytomy in the tree file. I have looked for it just using egrep \w+:\d\.\d,\w+:\d\.\d,\w+ to identify more than two tips together like tip:BL,tip:BL,tip:BL and find none. (I also realize a rooted tree can be a problem and I do know how to unroot - but I am sure a polytomy would cause an issue so want to resolve that first.)

Is there any other explanation for this behavior (like any chance pxlstr would return false for binary without a polytomy)? Or any tips for identifying the polytomies (without reading the tree into R and using an R package)? Of course I have viewed the tree and although I don't see a polytomy, it's very possible I missed it by eye.

Backbone_scaffolded.pxrmt.noBL.txt

tree #: 0 rooted: true binary: false nterminal: 13580 ninternal: 13575 branch lengths: true rttipvar: 0.00793982 treelength: 187.779 ultrametric: false rootheight: NA

Thank you!

josephwb commented 3 years ago

Hi @HeatherKates I will look at this. But as a first quick check APE says that it is not binary either.

HeatherKates commented 3 years ago

Thanks @josephwb. I uploaded the wrong tree - the tree has branch lengths and some are "0". This is the tree I meant to share (cladogram). But in both cases I'm now checking in ape and also find that the tree is not binary so this is not a phyx issue. Feel free to disregard and close unless you happen to have an idea! Backbone_scaffolded.cladogram.tree.txt

josephwb commented 3 years ago

Thanks.

Seems that egrep (grep -E) does not support \d? Try grep -P instead. (That said, I am still looking for the polytomy...)

josephwb commented 3 years ago

R tells me that there are 4 polytomies. Tracking down where they are now.

HeatherKates commented 3 years ago

What tool/code are you using to do that? I can randomly resolve the polytomies in ape and move on but obviously I want to know where they were so I'm very curious. Again...not a phyx caused issue in all likelihood so I doubly appreciate your help. Maybe could lead to a feature request though....!

josephwb commented 3 years ago

Here is the smallest clade. poly

The way I did this is looking at the edge matrix. If you know how APE (crazily) stores trees, the edges are stored in a 2 column matrix (1st column = ancestor ID, 2nd column = descendant ID). For a binary tree, each ancestor node ID should appear in column 1 exactly twice (it is the ancestor to 2 children). Any node ID that appears more than twice in that column is a polytomy (more generally, if it appears N times, it has N children).

HeatherKates commented 3 years ago

Wow, that is so helpful (obviously not a polytomy I want randomly resolved, an error in pasting in subtrees!). That is a clever straightforward way to ID these - surprised there's no a function written in ape to do it. I will save it for future use! Thanks again.

josephwb commented 3 years ago

Gah there probably is a function. I did it in a very dumb way:

# required packages
require(ape); require(phangorn);

# read in tree
phy <- read.tree("~/Downloads/Backbone_scaffolded.cladogram.tree.tre");

# find which nodes are ancestors to >2 children

# count
xx <- table(phy$edge[,1]);

# find max
max(xx); # it is 3

# find which nodes appear 3 times. a bit convoluted, going from characters to integers
pols <- xx[which(xx == 3)]; # this has the info, but the IDs are stored as names. ugh
pols <- as.numeric(names(pols));

# print out node IDS
pols

# use Phangorn to get the tip descendants of each of these nodes. this will
# help us locate the polytomies
clades <- Descendants(phy, pols); # there are 4 clades in a list

# clades stores the node ids of the tips. to get the names for a clade
# do (e.g. for the 4th one):
phy$tip.label[clades[[4]]];
josephwb commented 3 years ago

BTW a polytomy-finder for big trees for phyx seems useful (and should be next to trivial to set up; I believe we store the number of children as a property of each node, so we need only traverse the tree).

josephwb commented 3 years ago

And I am at a loss on how one would detect a polytomy using grep if it is deep in the tree (i.e., does not involve terminals alone). But my grep skills are limited, so if you find a way please share it.

HeatherKates commented 3 years ago

No, I think grep is a lost cause. In the past when this happened it seemed to be a result of dropping tips and the polytomies happened to be at the tips only, so that was just a low hanging fruit to try first. I look forward to seeing it as a feature someday! Closing this now. Thanks for taking the time to help with my problem.