DerrickWood / kraken2

The second version of the Kraken taxonomic sequence classification system
MIT License
714 stars 271 forks source link

Does including nucleotide sequences in custom database increase the mapping power significantly? #752

Open nataliaglazman opened 1 year ago

nataliaglazman commented 1 year ago

Hi, I am using a custom database of assembled genomes to analyse my shotgun eDNA data. The microbiome I am sampling (aquatic) is largely uncharacterised, so there are relatively few genomes for the organisms that I know should exist there. I have a list of these organisms, so I was wondering - would including all available nucleotide sequences for the organisms that don't have assembled genomes increase the classification power significantly?

I realise this is a very specific question & is probably hard to answer without running a test, but I was wondering if maybe anyone had done anything similar, or has used the nt database to improve classification power.

jenniferlu717 commented 9 months ago

I would not use all available nucleotide sequences as this may have an unfortunate false positive effect given that the sequences are not assembled/filtered.

If you are trying to analyze this sampleset, I would classify against the standard database available but then afterwards, try assembly?