Closed HitMonk closed 3 years ago
Hej,
it does look strange indeed. However I do not have the same problem in the databases that I built. Perhaps it could be a problem with the base tree input file? This is where I speciefied the ranks for the two nodes Archaea and Bacteria before the merge. If it is not specified in this file, the rank for Archaea and Bacteria will default be "no rank"
parent child rank
root cellular_organism no rank
cellular_organism Archaea kingdom
cellular_organism Bacteria kingdom
This is the output from both databases I build during the bugtests
k__Bacteria|p__Proteobacteria 317692
k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria 196875
k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Enterobacterales 56047
k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Enterobacterales|f__Enterobacteriaceae 24122
k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Enterobacterales|f__Enterobacteriaceae|g__Buchnera 4361
k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Enterobacterales|f__Enterobacteriaceae|g__Buchnera|s__Buchnera aphidicola_V 180
k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Enterobacterales|f__Enterobacteriaceae|g__Buchnera|s__Buchnera aphidicola_T 151
k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Enterobacterales|f__Enterobacteriaceae|g__Buchnera|s__Buchnera aphidicola_AB 145
Did you use GTDB
yeah, i did use the same files that we shared with each other. The one difference that i see now is in my file i have domain instead of kingdom. This is also mentioned in the section 3.11 Database root. So i guess it'll be fixed if i just change that. Ill make new database and get back with a confirmation
Oh good point! I remember we had a lot of discussions of the naming of different ranks and that the official class for Bacteria and Archaea is supposed to be domain. I did not think when I changed it that kingdom is what is used and required by kraken and qiime for example.
Thanks for spotting this I will change the example file to kingdom!
Hello again! I was working with 16s datasets and I think i found a bug with the kingdom level taxonomy in the kraken2 16s GTDB database. It looks like the kingdom taxonomy is empty for me. There is no description of bacteria and archaeal kingdoms and instead the taxonomies start with the phylum level.
this was more evident when i imported the biom files into R, but you can still see it here. Do you see a similar situation? Its not too difficult to correct the taxonomy in R though.