Closed edersonjesus closed 1 year ago
I found out that I didn't add a semicolon at the end of each taxonomy string, such as in "Archaeosporomycetes;Archaeosporales;Archaeosporaceae;Archaeospora;Wirsel OTU21;".
After adding it to the names of all sequences, the warning message doesn't appear anymore and the hitherto unclassified ASVs are properly classified.
Thanks!
Hi @edersonjesus , Nice to hear that you were abale to figure that out. Could you please share codes how did you make 'own dada2-compatible database'? I am also indend to make dada2-compatible maarjam database to asign taxonomy.
Hi! It has been a while since I did that, but here follows the code. See if that works. I tried it again quickly, and it worked for me. You will also find a fasta file with the sequences, a VT type with the taxonomic information, and a file with the composite names that I created manually based on the BT type file.
I found useful information here: https://github.com/benjjneb/dada2/issues/581 and https://benjjneb.github.io/dada2/training.html
Hope that helps!
Cheers,
Ed
nomes_compostos.csv vt_types_from_05-06-2019.xls vt_types_fasta_from_05-06-2019.txt
Don't forget to add the final ; to the name of each sequence. Cheers!
Hello everyone! Hello @benjjneb !
I am analyzing Arbuscular Mycorrhizal Fungi (AMF) 18S rRNA sequences. For that, I am using the Maarjam database https://maarjam.ut.ee I downloaded it, created my own dada2-compatible database (you can see it in the attached file if you wish), and ran assignTaxonomy. I got lots of NAs in the output object, among them the most abundant ASVs. I blasted some of these NAs. Here is one:
AGCAGCCGCGGTAATTCCAGCTCCAATAGCGTATATTAAAGTTGTTGCAGTTAAAAAGCTCGTAGTTGAATTTCGGGGTCAGCAGGTTGGTCGTGCCAATGGTATGCACTGGCCTTGCTGATTCCTCCCTCTTGTAGAACCGTAATGCCATTAAGTTGGTGTTGCGGGGAAACAGGACTGTTACTTTGAAAAAATTAGAGTGTTTAAAGCAGGCTAACGCCTGAATACATTAGCATGGAATAATGAAATAGGACGATCGATCCTATTTTGTTGGTTTCTA
I've got matches to AMF and even 100% coverage and maximum identity with a sequence within the Maarjam database. That is, the sequences should have been classified, but they were not. Somehow, dada2 is not identifying them, and I wonder what is happening, and if there is something I can do to solve this issue.
Here are the commands I am using. Previous commands are similar to the dada2 tutorial:
maarjam <- "maarjam_dada2.fasta"
taxa <- assignTaxonomy(seqtab.nochim, maarjam, tryRC = TRUE, taxLevels = c("Class", "Order", "Family", "Genus", "Species"), multithread = FALSE)
I've got this warning message, which may explain what is going on:
Warning message: In matrix(unlist(strsplit(genus.unq, ";")), ncol = td, byrow = TRUE) : data length [1569] is not a sub-multiple or multiple of the number of rows [314]
I saw a previous call about this type of message, but I still don't understand what is going on.
Thanks!
Ed
maarjam_dada2.txt