Closed ramiroricardo closed 10 months ago
No, you might not get the same results if you use 2 separate, non-overlapping DBs. For example, if a k-mer is present in both DBs but in just 1 species in each one, then in the full DB that k-mer will be assigned to the lowest common ancestor of the 2 species. In the 2 separate DBs, the k-mer will be assigned at the species level to 2 different species.
Hi @salzberg thanks a lot for the quick reply and for clearing my misunderstanding.
Hi all,
We are working on a problem in which we would like to extract the taxonomic assignments of each kmer and just ignore the full sequence. We had thought of building one single database that includes all the genomes that we care about, but we are not sure we will be able to do this, due to computational requirements. However, given that kraken is using exact kmer matches, I am wondering if this matters? So for example, if I do classification of some sequences against two separate DBs or against a single DB that has all the genomes that were present in the other two, I think I should get the same results. Is this expectation correct?
thanks