DanilKrivonos / FunFun

ITS based fungi annotator
MIT License
3 stars 0 forks source link

Inadequate Taxonomy Assignment Results with EUKAYOME Database in DADA2 #4

Closed KitHubb closed 4 months ago

KitHubb commented 4 months ago

Hello, I posted an issue in the FunFun repository, but I realized that I made a mistake with the content. I would like to delete the issue, but I do not have the necessary permissions to do so.

The issue in question is #4. I apologize for any inconvenience this may cause. Could you please delete the issue for me?

Thank you very much for your assistance.

Best regards, So-Yeon Kim

KitHubb commented 4 months ago

First of all, thank you for your tremendous effort in creating the EUKAYOME database. I am sure it required a lot of hard work and dedication.

I am a DADA2 user and have been utilizing it directly in R. While there is a link for the EUKAYOME database for DADA2 on Google, it is inactive.

I receive all' NA' values when I use a “General FASTA” file for taxonomy annotation. Here are the steps I followed and the issues I encountered:

Steps Taken:

  1. Modified FASTA file for DADA2 using the following commands:
sed -e 's/^>[^;]*;/>/' General_EUK_ITS_v1.8.fasta > General_EUK_ITS_v1.8_modi4.fasta # Remove FASTA ID
sed -e '/^>/ s/$/;/'   General_EUK_ITS_v1.8_modi4.fasta >  General_EUK_ITS_v1.8_modi5.fasta # Add ';" to end for FASTA names
  1. Assigned taxonomy using the modified FASTA file:
  # UNITE 8.2
  taxa.82.QC30.its2 <- dada2::assignTaxonomy(ASVs, unite.ref.8.2, multithread = TRUE, tryRC = TRUE)

  # Check output
  taxa.82.QC30.its2[,1] %>% table
  # k__Fungi: 1607

  taxa.82.QC30.its2[,7] %>% is.na() %>% table
  # FALSE: 417, TRUE: 1190

  # EUK 1.8
  EUK <- "/data/Reference/ITS/DADA2/EUKAYOME/ver1.8/General_EUK_ITS_v1.8_modi5.fasta"
  set.seed(42)
  taxa.EUK.QC30.its2 <- dada2::assignTaxonomy(ASVs, EUK, multithread = TRUE, tryRC = TRUE)

  # Check output
  taxa.EUK.QC30.its2[,1] %>% table
  # k__Fungi: 223, k__Mitochondrion: 3, k__Viridiplantae: 18

  taxa.EUK.QC30.its2[,1] %>% is.na() %>% table
  # FALSE: 407, TRUE: 1200

  taxa.EUK.QC30.its2[,7] %>% is.na() %>% table
  # FALSE: 38, TRUE: 1569

Issues Encountered:

  • The number of sequences matched at the Kingdom level and below was very low.
  • To investigate further, I extracted the top 1 million lines from the database and performed the taxonomy assignment again:
EUK_1m <- "/data/Reference/ITS/DADA2/EUKAYOME/ver1.8/General_EUK_ITS_v1.8_modi5_1m.fasta"

set.seed(42)
taxa.EUK_1m.QC30.its2 <- dada2::assignTaxonomy(ASVs, EUK_1m ,  multithread = TRUE, tryRC = TRUE) # 
taxa.EUK_1m.QC30.its2[,7] %>% is.na()%>% table
# FALSE  TRUE 
#   441  1166 
taxa.EUK_1m.QC30.its2[,1]  %>% table
# k__cf.Choanoflagellozoa k__cf.Corallochytriozoa            k__Cryptista                k__Fungi 
#                       1                       1                      13                     585 
#              k__Metazoa         k__Rhodoplantae         k__Straminipila        k__Viridiplantae 
#                       1                       4                      10                     214 

Despite these efforts, the taxonomy assignment results at both the Kingdom and Species levels are not satisfactory. Many sequences remain unassigned, even at higher taxonomic levels.

Request for Assistance:

Could you please provide guidance on the following:

  1. Are there specific modifications required for the EUKAYOME database FASTA file to work efficiently with DADA2?
  2. Any additional steps or tips to improve the taxonomy assignment results with this database?

Thank you very much for your support and for providing such valuable resources.

Best regards,

So-Yeon Kim