Closed bweiler89 closed 4 months ago
First thing I would check, maybe with a subset of sequences you know are being misassigned, is if it is a sequence orientation issue. assignTaxonomy(..., tryRC=TRUE)
will also check the reverse-complement orientation of all query sequences. I know in other cases that reverse-complemented (relative to the reference orientation) bacterial sequences sometimes get assigned to Eukaryota.
I am trying to assign taxonomy using a custom reference database with 8 taxonomic levels, however the taxonomy is having a very tough time assigning correctly. I am using this on a fairly large dataset on our supercomputer which takes roughly 2-3 days to assign.
The code: library(dada2)
ref_fasta <- "/customDBs/customreferenceDB.fa" taxa <- assignTaxonomy(stnochimera, taxLevels = c("Kingdom", "Phylum", "Sub-Phylum", "Class", "Order", "Family", "Genus", "Species"),refFasta=ref_fasta, multithread=TRUE)
However the exported ASVs.fa and ASVs_taxonomy.tsv show very poor taxonomic assignment (255k NAs of ~700k ASVs) where in some cases eukaryotes are being assigned prokaryotic taxonomy... or NAs are easily blasted to Endozoicomonas sp.
Here's the format of the reference database (where all sequences are one line after header, headers include a subphylum for eukaryotic formatting):
Looking for any suggestions as to why I cannot seem to get solid assignment, especially those eukaryotic sequences that are being assigned bacteria.