CompEvol / beast2

Bayesian Evolutionary Analysis by Sampling Trees
www.beast2.org
GNU Lesser General Public License v2.1
237 stars 83 forks source link

TreeAnnotator fails with user-defined target trees #624

Closed tgvaughan closed 7 years ago

tgvaughan commented 8 years ago

For example:

$ treeannotator -burnin 10 -target 
<snip>
0              25             50             75            100
|--------------|--------------|--------------|--------------|
*************************************************************
Reading user specified target tree, clonal_frame.newick bacter.clonalframe.trees 
Collecting node information...
0              25             50             75            100
|--------------|--------------|--------------|--------------|
Error Parsing Input Tree: / by zero

The target tree file is a valid newick file without any funny business (eg single child nodes, zero branch lengths, multifurcations, ...) and the input tree log file is similarly boring. The log file is summarized without any problems when no target tree is specified.

tgvaughan commented 8 years ago

While the above commit addresses the particular exception thrown, the resulting summary tree doesn't contain any summary information - the nodes are not annotated at all and their placement is peculiar.

Using the -heights mean option instead of the default -heights ca yields a slightly better result, with a more reasonable node placement and some annotation, but almost all clades are marked as having a zero posterior, even though they absolutely do occur in the tree set corresponding to the input log file.

tgvaughan commented 8 years ago

This looks to be an issue with the assignment of node numbers by the parsers. The target tree is read in using TreeParser, while the tree set to summarize is loaded in using TreeAnnotator's own parser. For some reason this leads to different numbering schemes used to identify nodes and hence clades between the target tree and the tree set.

tgvaughan commented 8 years ago

These problems vanish when the -lowMem option is used. This is because that option causes TreeAnnotator to use the MemoryFriendlyTreeSet class which employs TreeParser to parse the trees, rather than the TreeAnnotator parser.

tgvaughan commented 8 years ago

The issue here is that TreeParser assigns leaf node numbers according to the lexographical order of the leaf labels. The TreeAnnotator parser doesn't appear to do this, but instead assigns leaf numbers according to the result of Integer.fromString() applied to the leaf labels.

tgvaughan commented 8 years ago

While trawling through this, I've noticed the TreeSetParser.parseFile() method has a bunch of stuff related to extracting geographical information from labels. Since this method is only used by FastTreeSet and not MemoryFriendlyTreeSet, I wonder if the -lowMem switch breaks anything relating to continuous phylogeography?

tgvaughan commented 8 years ago

Okay, so the above commit (made on a separate branch) now means that TA seems to work as expected without the -lowMem switch and with -heights mean. The default -heights ca still causes problems though, resulting in a (now much more sensible looking) tree without node annotations.