benjjneb / dada2

Accurate sample inference from amplicon data with single nucleotide resolution
GNU Lesser General Public License v3.0
469 stars 142 forks source link

Function assignTaxonomy is not classifying sequences based on a custom database and generating many NA #1626

Closed edersonjesus closed 1 year ago

edersonjesus commented 1 year ago

Hello everyone! Hello @benjjneb !

I am analyzing Arbuscular Mycorrhizal Fungi (AMF) 18S rRNA sequences. For that, I am using the Maarjam database I downloaded it, created my own dada2-compatible database (you can see it in the attached file if you wish), and ran assignTaxonomy. I got lots of NAs in the output object, among them the most abundant ASVs. I blasted some of these NAs. Here is one:


I've got matches to AMF and even 100% coverage and maximum identity with a sequence within the Maarjam database. That is, the sequences should have been classified, but they were not. Somehow, dada2 is not identifying them, and I wonder what is happening, and if there is something I can do to solve this issue.

Here are the commands I am using. Previous commands are similar to the dada2 tutorial:

maarjam <- "maarjam_dada2.fasta"
taxa <- assignTaxonomy(seqtab.nochim, maarjam, tryRC = TRUE, taxLevels = c("Class", "Order", "Family", "Genus", "Species"), multithread = FALSE)

I've got this warning message, which may explain what is going on:

Warning message: In matrix(unlist(strsplit(genus.unq, ";")), ncol = td, byrow = TRUE) : data length [1569] is not a sub-multiple or multiple of the number of rows [314]

I saw a previous call about this type of message, but I still don't understand what is going on.




edersonjesus commented 1 year ago

I found out that I didn't add a semicolon at the end of each taxonomy string, such as in "Archaeosporomycetes;Archaeosporales;Archaeosporaceae;Archaeospora;Wirsel OTU21;".

After adding it to the names of all sequences, the warning message doesn't appear anymore and the hitherto unclassified ASVs are properly classified.


abu85 commented 1 year ago

Hi @edersonjesus , Nice to hear that you were abale to figure that out. Could you please share codes how did you make 'own dada2-compatible database'? I am also indend to make dada2-compatible maarjam database to asign taxonomy.

edersonjesus commented 1 year ago

Hi! It has been a while since I did that, but here follows the code. See if that works. I tried it again quickly, and it worked for me. You will also find a fasta file with the sequences, a VT type with the taxonomic information, and a file with the composite names that I created manually based on the BT type file.

I found useful information here: and

Hope that helps!



nomes_compostos.csv vt_types_from_05-06-2019.xls vt_types_fasta_from_05-06-2019.txt

edersonjesus commented 1 year ago

Don't forget to add the final ; to the name of each sequence. Cheers!