Closed tgvaughan closed 7 years ago
While the above commit addresses the particular exception thrown, the resulting summary tree doesn't contain any summary information - the nodes are not annotated at all and their placement is peculiar.
Using the -heights mean
option instead of the default -heights ca
yields a slightly better result, with a more reasonable node placement and some annotation, but almost all clades are marked as having a zero posterior, even though they absolutely do occur in the tree set corresponding to the input log file.
This looks to be an issue with the assignment of node numbers by the parsers. The target tree is read in using TreeParser, while the tree set to summarize is loaded in using TreeAnnotator's own parser. For some reason this leads to different numbering schemes used to identify nodes and hence clades between the target tree and the tree set.
These problems vanish when the -lowMem
option is used. This is because that option causes TreeAnnotator to use the MemoryFriendlyTreeSet class which employs TreeParser to parse the trees, rather than the TreeAnnotator parser.
The issue here is that TreeParser assigns leaf node numbers according to the lexographical order of the leaf labels. The TreeAnnotator parser doesn't appear to do this, but instead assigns leaf numbers according to the result of Integer.fromString() applied to the leaf labels.
While trawling through this, I've noticed the TreeSetParser.parseFile() method has a bunch of stuff related to extracting geographical information from labels. Since this method is only used by FastTreeSet and not MemoryFriendlyTreeSet, I wonder if the -lowMem
switch breaks anything relating to continuous phylogeography?
Okay, so the above commit (made on a separate branch) now means that TA seems to work as expected without the -lowMem
switch and with -heights mean
. The default -heights ca
still causes problems though, resulting in a (now much more sensible looking) tree without node annotations.
For example:
The target tree file is a valid newick file without any funny business (eg single child nodes, zero branch lengths, multifurcations, ...) and the input tree log file is similarly boring. The log file is summarized without any problems when no target tree is specified.