Ecogenomics / GTDBTk

GTDB-Tk: a toolkit for assigning objective taxonomic classifications to bacterial and archaeal genomes.
https://ecogenomics.github.io/GTDBTk/
GNU General Public License v3.0
479 stars 82 forks source link

RED value and classification results #564

Closed rusanovaA closed 11 months ago

rusanovaA commented 11 months ago

Dear GTDBtk team, Thank you for your wonderful tool! I have a question regarding the criteria for naming a new Family. Following the taxonomic analysis of my genome (Completeness 91.72%, Contamination 6.5%) with classify_wf, the resulting classification is "dBacteria;pPseudomonadota;cGammaproteobacteria;oArenicellales;f;g;s__" But, the RED value is 0.71042. If I understand this parameter correctly, it designates my genome only as a novel genus. Can I still propose a name for a new Family based on these results? Thank you for your assistance!

donovan-h-parks commented 11 months ago

Hi,

GTDB-Tk is indicating that your genome belongs to a novel family in the Arenicellales. RED is a relative measure so the 0.71 only makes sense in comparison to other taxa in the same tree. If this value is from GTDB-Tk, it is relative to the GTDB reference tree. A value of 0.7 would agree with the assessment of your genome being at family level depth in the tree, see: https://gtdb.ecogenomic.org/stats/r214#relative-evolutionary-divergence

Cheers, Donovan

rusanovaA commented 11 months ago

Thank you for your response! May I ask another small question? Am I correct in understanding that now when constructing a tree with de_novo_wf, there is bootstrap involved?

donovan-h-parks commented 11 months ago

By default, the de novo workflow will infer a tree using FastTree and calculate local support values using the Shimodaira-Hasegawa test: https://ecogenomics.github.io/GTDBTk/commands/de_novo_wf.html

rusanovaA commented 11 months ago

Thank you very much for help!