Closed KitHubb closed 4 months ago
First of all, thank you for your tremendous effort in creating the EUKAYOME database. I am sure it required a lot of hard work and dedication.
I am a DADA2 user and have been utilizing it directly in R. While there is a link for the EUKAYOME database for DADA2 on Google, it is inactive.
I receive all' NA' values when I use a “General FASTA” file for taxonomy annotation. Here are the steps I followed and the issues I encountered:
Steps Taken:
- Modified FASTA file for DADA2 using the following commands:
sed -e 's/^>[^;]*;/>/' General_EUK_ITS_v1.8.fasta > General_EUK_ITS_v1.8_modi4.fasta # Remove FASTA ID sed -e '/^>/ s/$/;/' General_EUK_ITS_v1.8_modi4.fasta > General_EUK_ITS_v1.8_modi5.fasta # Add ';" to end for FASTA names
- Assigned taxonomy using the modified FASTA file:
# UNITE 8.2 taxa.82.QC30.its2 <- dada2::assignTaxonomy(ASVs, unite.ref.8.2, multithread = TRUE, tryRC = TRUE) # Check output taxa.82.QC30.its2[,1] %>% table # k__Fungi: 1607 taxa.82.QC30.its2[,7] %>% is.na() %>% table # FALSE: 417, TRUE: 1190 # EUK 1.8 EUK <- "/data/Reference/ITS/DADA2/EUKAYOME/ver1.8/General_EUK_ITS_v1.8_modi5.fasta" set.seed(42) taxa.EUK.QC30.its2 <- dada2::assignTaxonomy(ASVs, EUK, multithread = TRUE, tryRC = TRUE) # Check output taxa.EUK.QC30.its2[,1] %>% table # k__Fungi: 223, k__Mitochondrion: 3, k__Viridiplantae: 18 taxa.EUK.QC30.its2[,1] %>% is.na() %>% table # FALSE: 407, TRUE: 1200 taxa.EUK.QC30.its2[,7] %>% is.na() %>% table # FALSE: 38, TRUE: 1569
Issues Encountered:
- The number of sequences matched at the Kingdom level and below was very low.
- To investigate further, I extracted the top 1 million lines from the database and performed the taxonomy assignment again:
EUK_1m <- "/data/Reference/ITS/DADA2/EUKAYOME/ver1.8/General_EUK_ITS_v1.8_modi5_1m.fasta" set.seed(42) taxa.EUK_1m.QC30.its2 <- dada2::assignTaxonomy(ASVs, EUK_1m , multithread = TRUE, tryRC = TRUE) # taxa.EUK_1m.QC30.its2[,7] %>% is.na()%>% table # FALSE TRUE # 441 1166 taxa.EUK_1m.QC30.its2[,1] %>% table # k__cf.Choanoflagellozoa k__cf.Corallochytriozoa k__Cryptista k__Fungi # 1 1 13 585 # k__Metazoa k__Rhodoplantae k__Straminipila k__Viridiplantae # 1 4 10 214
Despite these efforts, the taxonomy assignment results at both the Kingdom and Species levels are not satisfactory. Many sequences remain unassigned, even at higher taxonomic levels.
Request for Assistance:
Could you please provide guidance on the following:
- Are there specific modifications required for the EUKAYOME database FASTA file to work efficiently with DADA2?
- Any additional steps or tips to improve the taxonomy assignment results with this database?
Thank you very much for your support and for providing such valuable resources.
Best regards,
So-Yeon Kim
Hello, I posted an issue in the FunFun repository, but I realized that I made a mistake with the content. I would like to delete the issue, but I do not have the necessary permissions to do so.
The issue in question is #4. I apologize for any inconvenience this may cause. Could you please delete the issue for me?
Thank you very much for your assistance.
Best regards, So-Yeon Kim