Open nataliaglazman opened 1 year ago
I would not use all available nucleotide sequences as this may have an unfortunate false positive effect given that the sequences are not assembled/filtered.
If you are trying to analyze this sampleset, I would classify against the standard database available but then afterwards, try assembly?
Hi, I am using a custom database of assembled genomes to analyse my shotgun eDNA data. The microbiome I am sampling (aquatic) is largely uncharacterised, so there are relatively few genomes for the organisms that I know should exist there. I have a list of these organisms, so I was wondering - would including all available nucleotide sequences for the organisms that don't have assembled genomes increase the classification power significantly?
I realise this is a very specific question & is probably hard to answer without running a test, but I was wondering if maybe anyone had done anything similar, or has used the nt database to improve classification power.