bhattlab / phanta

Workflow to rapidly quantify taxa from all domains of life, directly from short-read human gut metagenomes
MIT License
60 stars 9 forks source link

Clarification on HumGut taxonomy #4

Closed snayfach closed 2 years ago

snayfach commented 2 years ago

Could you please clarify the taxonomy used for the Bacteria and Archaea in your database. I understand the mapping is done to the HumGut genomes, but how are those annotated? NCBI or GTDB? And if GTDB, which version? If not an GTDB, could that be provided? Also how are unannotated species clusters handled? Are they represented in the taxonomy or only counted towards the lowest annotated rank? Thanks for answering all my questions.

meenachakra commented 2 years ago

Hi Stephen, thank you for the question! Kraken2 database construction requires two taxonomy files, names.dmp and nodes.dmp. For the bacterial and archaeal portions of the taxonomy, we used the ncbi_names.dmp and ncbi_nodes.dmp available at this link from the HumGut paper - https://arken.nmbu.no/~larssn/humgut/. Based on the HumGut paper, and the ncbi_tax_id column of the HumGut.tsv file on that website, all the 30,691 HumGut genomes were given an NCBI taxonomy assignment; in the taxonomy files, the genomes are placed as "direct descendants" of the given assignment.

snayfach commented 2 years ago

Do you think it would be possible to provide a version that uses the latest GTDB taxonomy? I suspect other users would use this as well. Thanks!

yipinto commented 2 years ago

HumGut was also provided with GTDB taxonomy (GTDB-Tk, release 05-RS95), so it should be possible to make a Phanta db with the latest GTDB taxonomy. Will update.

snayfach commented 2 years ago

RS95 is two releases and two years behind (July 2020) with the newer releases having much better species-level coverage of gut bacteria. But I understand using that one is easier. You could likely obtain the latest version for all the HumGut genomes by downloading taxonomy files from the UHGG and GTDB websites. Thanks!

On Thu, Sep 8, 2022 at 3:50 PM yipinto @.***> wrote:

HumGut was also provided with GTDB taxonomy (GTDB-Tk, release 05-RS95), so it should be possible to make a Phanta db with the latest GTDB taxonomy. Will update.

— Reply to this email directly, view it on GitHub https://github.com/bhattlab/phanta/issues/4#issuecomment-1241313094, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQBXLMQCOWDDCOQA4GWEC3V5JUTZANCNFSM6AAAAAAQH3LA5U . You are receiving this because you authored the thread.Message ID: @.***>

snayfach commented 1 year ago

No urgency on a solution, but would you mind leaving the issue open as it has not been solved? Thanks!