FOI-Bioinformatics / flextaxd

FlexTaxD (Flexible Taxonomy Databases) - Create, add, merge different taxonomy sources (QIIME, GTDB, NCBI and more) and create metagenomic databases (kraken2, ganon and more )
GNU General Public License v3.0
65 stars 8 forks source link

16s GTDB database: Kingdom taxonomy issue #41

Closed HitMonk closed 3 years ago

HitMonk commented 3 years ago

Hello again! I was working with 16s datasets and I think i found a bug with the kingdom level taxonomy in the kraken2 16s GTDB database. It looks like the kingdom taxonomy is empty for me. There is no description of bacteria and archaeal kingdoms and instead the taxonomies start with the phylum level.

For example:
kraken2-inspect --db GTDB_orig_krakendb/ --use-mpa-style --threads 5 | head -50
# Database options: nucleotide db, k = 35, l = 31
# Spaced mask = 11111111111111111111111111111111110011001100110011001100110011
# Toggle mask = 1110001101111110001010001100010000100111000110110101101000101101
# Total taxonomy nodes: 31893
# Table size: 1789033
# Table capacity: 2603885
# Min clear hash value = 0
p__Proteobacteria   317702
p__Proteobacteria|c__Gammaproteobacteria    196884
p__Proteobacteria|c__Gammaproteobacteria|o__Enterobacterales    56049
p__Proteobacteria|c__Gammaproteobacteria|o__Enterobacterales|f__Enterobacteriaceae  24124
p__Proteobacteria|c__Gammaproteobacteria|o__Enterobacterales|f__Enterobacteriaceae|g__Buchnera  4361
p__Proteobacteria|c__Gammaproteobacteria|o__Enterobacterales|f__Enterobacteriaceae|g__Buchnera|s__Buchnera aphidicola_V 180
p__Proteobacteria|c__Gammaproteobacteria|o__Enterobacterales|f__Enterobacteriaceae|g__Buchnera|s__Buchnera aphidicola_T 151
p__Proteobacteria|c__Gammaproteobacteria|o__Enterobacterales|f__Enterobacteriaceae|g__Buchnera|s__Buchnera aphidicola_AB    145

this was more evident when i imported the biom files into R, but you can still see it here. Do you see a similar situation? Its not too difficult to correct the taxonomy in R though.

davve2 commented 3 years ago

Hej,

it does look strange indeed. However I do not have the same problem in the databases that I built. Perhaps it could be a problem with the base tree input file? This is where I speciefied the ranks for the two nodes Archaea and Bacteria before the merge. If it is not specified in this file, the rank for Archaea and Bacteria will default be "no rank"

parent  child   rank
root    cellular_organism       no rank
cellular_organism       Archaea kingdom
cellular_organism       Bacteria        kingdom

This is the output from both databases I build during the bugtests

k__Bacteria|p__Proteobacteria   317692
k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria    196875
k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Enterobacterales        56047
k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Enterobacterales|f__Enterobacteriaceae  24122
k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Enterobacterales|f__Enterobacteriaceae|g__Buchnera      4361
k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Enterobacterales|f__Enterobacteriaceae|g__Buchnera|s__Buchnera aphidicola_V     180
k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Enterobacterales|f__Enterobacteriaceae|g__Buchnera|s__Buchnera aphidicola_T     151
k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Enterobacterales|f__Enterobacteriaceae|g__Buchnera|s__Buchnera aphidicola_AB    145

Did you use GTDB

HitMonk commented 3 years ago

yeah, i did use the same files that we shared with each other. The one difference that i see now is in my file i have domain instead of kingdom. This is also mentioned in the section 3.11 Database root. So i guess it'll be fixed if i just change that. Ill make new database and get back with a confirmation

davve2 commented 3 years ago

Oh good point! I remember we had a lot of discussions of the naming of different ranks and that the official class for Bacteria and Archaea is supposed to be domain. I did not think when I changed it that kingdom is what is used and required by kraken and qiime for example.

Thanks for spotting this I will change the example file to kingdom!