Closed Varun9599 closed 1 year ago
Can you provide an example sequence that is getting this assignment?
Thanks for the reply! This is one of such sequences in my ASV table: CCTGTTCGCTCCCCACGCTTTCGAGCCTCAGCGTCAGTTACAGACCAGAGAGCCGCTTTCGCCACCGGTGTTCCTCCATATATCTACGCATTTCACCGCTACACATGGAATTCCACTCTCCCCTTCTGCACTCAAGTTTGACAGTTTCCAAAGCGAACTATGGTTGAGCCACAGCCTTTAACTTCAGACTTATCAAACCGCCTGCGCTCGCTTTACGCCCAATAAATCCGGACAACGCTCGGGACCTACGT
I tried and can't reproduce your results:
dada2::assignTaxonomy("CCTGTTCGCTCCCCACGCTTTCGAGCCTCAGCGTCAGTTACAGACCAGAGAGCCGCTTTCGCCACCGGTGTTCCTCCATATATCTACGCATTTCACCGCTACACATGGAATTCCACTCTCCCCTTCTGCACTCAAGTTTGACAGTTTCCAAAGCGAACTATGGTTGAGCCACAGCCTTTAACTTCAGACTTATCAAACCGCCTGCGCTCGCTTTACGCCCAATAAATCCGGACAACGCTCGGGACCTACGT",
"~/SILVA138.1/silva_nr99_v138.1_train_set.fa.gz",
tryRC = TRUE)
Kingdom Phylum Class Order Family Genus
"Bacteria" "Firmicutes" "Bacilli" "Lactobacillales" "Streptococcaceae" "Streptococcus"
and for species
dada2::assignSpecies("CCTGTTCGCTCCCCACGCTTTCGAGCCTCAGCGTCAGTTACAGACCAGAGAGCCGCTTTCGCCACCGGTGTTCCTCCATATATCTACGCATTTCACCGCTACACATGGAATTCCACTCTCCCCTTCTGCACTCAAGTTTGACAGTTTCCAAAGCGAACTATGGTTGAGCCACAGCCTTTAACTTCAGACTTATCAAACCGCCTGCGCTCGCTTTACGCCCAATAAATCCGGACAACGCTCGGGACCTACGT",
"~/SILVA138.1/silva_species_assignment_v138.1.fa.gz",
tryRC = TRUE) Genus Species
NA NA
@salix-d Initially I was using the SILVA train set titled "silva_nr99_v138.1_wSpecies_train_set.fa". This SILVA train set is annotating a lot of species, such as the one I mentioned. When I repeated the taxa assignment with the regular train set, I got the same taxa assignment as you!
@Varun9599 Aaah!
Still a bit weird to get such a difference 🤔, but if I remember correctly, using the train set without species and then assigning species is what's suggested.
Another thing you could look at is the boot results. I suspect the boot result for the genera using the train set without species is stronger than the one using the train set with species.
Since the set with species does assign some species that sometimes have a stronger boot result, I made myself a function to merge the two result matrices while keeping the best assignment.
I tried using the same train set and it gives me
dada2::assignTaxonomy("CCTGTTCGCTCCCCACGCTTTCGAGCCTCAGCGTCAGTTACAGACCAGAGAGCCGCTTTCGCCACCGGTGTTCCTCCATATATCTACGCATTTCACCGCTACACATGGAATTCCACTCTCCCCTTCTGCACTCAAGTTTGACAGTTTCCAAAGCGAACTATGGTTGAGCCACAGCCTTTAACTTCAGACTTATCAAACCGCCTGCGCTCGCTTTACGCCCAATAAATCCGGACAACGCTCGGGACCTACGT",
"~/SILVA138.1/silva_nr99_v138.1_wSpecies_train_set.fa.gz",
tryRC = TRUE)
Kingdom Phylum Class Order Family Genus Species
1: Bacteria Firmicutes Bacilli Lactobacillales Streptococcaceae Streptococcus salivarius
Where you using tryRC = TRUE
?
Confirming that I see the same results as reported by both of you.
Assignment to Lactobacillus hominis when tryRC=FALSE
(the default), and assignment to Streptococcus salivarius when tryRC=TRUE
.
So the issue here is that the sequence is in the reverse complement orientation to the reference database, the default operation of assignTaxonomy
is to not check the reverse copmlement orientation (that is, tryRC=FALSE
by default) and the weakness of the naive Bayesian classifier method to over-classification when suitable outgroup sequences aren't present.
So, for @Varun9599 the fix is easy, just add tryRC=TRUE
to your assignTaxonomy
calls. The question for me is whether that should just be the default, at the cost of 2x the computation time.
Thank you @salix-d and @benjjneb!
I was using the default parameters. tryRC=TRUE
is generating better results!
Just for clarification, is it preferable to use the train set without species and add species subsequently to avoid the inaccurate annotations?
Just for clarification, is it preferable to use the train set without species and add species subsequently to avoid the inaccurate annotations?
Our recommendations for short-read 16S is assignTaxonomy
without species, then addSpecies
to augment to the species level. For long-read 16S we recommend assignTaxonomy
with species.
This is because the naive Bayesian classifier (which is the method implemented in assignTaxonomy
) can be overly permissive in assigning to the species-level with short-read data, thus the different method (exact matching) for that taxonomic level as implemented in assignSpecies
/addSpecies
.
@benjjneb Thank you!
Hello! I have been using SILVA v. 138 for taxonomic assignment. Since past few runs, I have been observing ASVs named "Lactobacillus hominis" prominently in the data. Although when I BLAST the corresponding sequences, it shows totally different bacteria. Is this a normal phenomenon? What steps should I take to ensure accurate taxonomic assignment?