Ecogenomics / GTDBTk

GTDB-Tk: a toolkit for assigning objective taxonomic classifications to bacterial and archaeal genomes.
https://ecogenomics.github.io/GTDBTk/
GNU General Public License v3.0
480 stars 82 forks source link

Significant differences between gtdbtk de_novo_wf (GTDB-Tk v2.3.2) and gtdbtk classify_wf (GTDB-Tk v2.1.0) #613

Open 473021677 opened 1 week ago

473021677 commented 1 week ago

Hi, I have used the “gtdbtk de_novo_wf“ command (GTDB-Tk v2.3.2) to produce the GTDB-tk result files for 160 MAGs. 6 and 22 of the 160 MAGs could be assigned g32-111 and gCaldipriscus, respectively, and the other 132 MAGs could not be assigned to a genus. Previously, taxonomic assignment of these 160 MAGs has been performed using GTDB-Tk v2.1.0 (gtdbtk classify_wf) based on the GTDB r207 database, and 100 of these the 160 MAGs could be assigned to 16 genera including the genus g32-111 and gCaldipriscus. I am not sure whether I could ignore the significant differences between gtdbtk de_novo_wf (GTDB-Tk v2.3.2, the GTDB r214.0 database) and gtdbtk classify_wf (GTDB-Tk v2.1.0, the GTDB r207 database). The significant differences should not be due to difference GTDB-Tk versions or difference GTDB databases. I need your help. I have appended the resulting files. Thanks.

Best regards,

Yang Yuan GTDB-Tk_v2.1.0_classify_wf_for_our_genomes.txt GTDB-Tk v2.3.2_de_novo_wf_for_our_genomes.txt

pchaumeil commented 12 hours ago

Hello, These workflows are fundamentally different right now. The classify workflow makes a best effort to determine the correct taxonomic assignment taking into account the placement of a genome and RED values. The de novo workflow infers a de novo tree and then decorates the tree using what is essentially implemented in PhyloRank. Currently, you should take the taxonomic assignments as a guide, but not as final classifications. In particular, there is no consideration of RED to determine if a new user genome is the most basal lineage of a group and thus will systematically under-classify compared to the classify workflow.