Does including nucleotide sequences in custom database increase the mapping power significantly?

DerrickWood / kraken2

The second version of the Kraken taxonomic sequence classification system

MIT License

714 stars 271 forks source link

Hi, I am using a custom database of assembled genomes to analyse my shotgun eDNA data. The microbiome I am sampling (aquatic) is largely uncharacterised, so there are relatively few genomes for the organisms that I know should exist there. I have a list of these organisms, so I was wondering - would including all available nucleotide sequences for the organisms that don't have assembled genomes increase the classification power significantly?

I realise this is a very specific question & is probably hard to answer without running a test, but I was wondering if maybe anyone had done anything similar, or has used the nt database to improve classification power.

DerrickWood / kraken2

Does including nucleotide sequences in custom database increase the mapping power significantly? #752