benjjneb / dada2

Accurate sample inference from amplicon data with single nucleotide resolution
http://benjjneb.github.io/dada2/
GNU Lesser General Public License v3.0
461 stars 142 forks source link

Incertae_sedis VS NA Taxonomic Assignment #2004

Open mya-darsan opened 3 weeks ago

mya-darsan commented 3 weeks ago

Hello!

I used UNITE to assign taxonomy. When looking at the assignments some received:

I am wondering what the difference is between an OTU being assigned NA and Incertae_sedis?

Is NA just completely not found in the database (therefore not a fungus even though it get kingdom fungus?) and Incertae_sedis is when it is a fungus but the relationship is unknown?

benjjneb commented 3 weeks ago

NA is assigned when the taxaonomic classification method finds different assignments at a given taxonomic level from subsets of the sequence than it did from the full sequence. The specifics of how this is done are described in the original paper on the naive Bayesian classifier: https://doi.org/10.1128/AEM.00062-07

An Incertae_sedis assignment means that the taxonomic classification method found a reference sequences with Incertae_sedis at that taxonomic level from the full length sequences, and from most of the subsets of the sequences. So, in some sense this "classification" is confident. However, Incertae_sedis means that the taxonomic placement at that level is uncertain. Thus, I would generally interpret this as the same as an NA -- we don't know the classification.

In shorter version: NA comes from uncertainty at the level of comparing the query sequence to the reference database, while Incertae_sedis comes from uncertainty of taxonomic assignments for the reference database entries themselves.