Ecogenomics / GTDBTk

GTDB-Tk: a toolkit for assigning objective taxonomic classifications to bacterial and archaeal genomes.
https://ecogenomics.github.io/GTDBTk/
GNU General Public License v3.0
464 stars 82 forks source link

Tree node labels #148

Closed cmfield closed 5 years ago

cmfield commented 5 years ago

The output tree produced by the classify workflow has node labels, which I assume are bootstrap values, sometimes with taxonomic labels. Most, but not all, of the nodes that lead to a placed input sequence (ie: non-reference) have no label however, which would make sense if they were all empty, assuming the tree wasn't actually remade but just modified, however that means I don't understand why some of these nodes do have labels.

Can someone explain these labels?

donovan-h-parks commented 5 years ago

Hello. User (non-reference) genomes are placed into a reference tree that has non-parametric bootstrap support values and taxonomic labels. The placement of the user genomes does not modify these labels. Bootstrap values, for instance, are NOT recalculated to account for the presence of the user genomes. In general, we do not recommend using this tree for most downstream phylogenetic analysis as the user genomes are inserted via ML instead of the tree being inferred de novo.

cmfield commented 5 years ago

Thanks for the clarification, it's what I suspected.