benjjneb / dada2

Accurate sample inference from amplicon data with single nucleotide resolution
http://benjjneb.github.io/dada2/
GNU Lesser General Public License v3.0
466 stars 142 forks source link

Assign Taxonomy using DECIPHER #658

Closed Sewunet-Abera closed 5 years ago

Sewunet-Abera commented 5 years ago

Hello Ben, I'm a beginner to bioinformatics analysis. Recently I got a data and worked through your tutorial well. Yet, following command using DECIPHER to assign taxonomy, taxid <- t(sapply(ids, function(x) { m <- match(ranks, x$rank) taxa <- x$taxon[m] taxa[startsWith(taxa, "unclassified_")] <- NA taxa })) colnames(taxid) <- ranks; rownames(taxid) <- getSequences(seqtab.nochim) taxa.print <- taxa However, I'm getting this message at last Error: object 'taxa' not found Pls could you guide me through to resolve it

benjjneb commented 5 years ago

Your code has defined taxid, but not `taxa. In the tutorial:

The taxid matrix from IdTaxa is a drop-in replacement for the taxa matrix from assignTaxonomy, simply set taxa <- taxid to carry on using the IdTaxa assignments.

So, if you are using the IdTaxa approach, replace taxa with taxid everywhere. For example, your last line needs to change to taxa.print <- taxid.

Alternatively you can just assign taxa <- taxid and then proceed as well.

Sewunet-Abera commented 5 years ago

Thanks Ben, It worked perfectly.

Sewunet Abera Ethiopian Institute of Agricultural Research National Agricultural Biotechnology Research Center Microbial Biotechnology Research Program P.O.Box: 31 Holetta, Ethiopia Phones: Office: +251-112 61 01 00 Mobile: +251-933 71 17 80 Fax: +251-011-237-03-77

On Thu, Jan 17, 2019 at 5:38 PM Benjamin Callahan notifications@github.com wrote:

Your code has defined taxid, but not `taxa. In the tutorial:

The taxid matrix from IdTaxa is a drop-in replacement for the taxa matrix from assignTaxonomy, simply set taxa <- taxid to carry on using the IdTaxa assignments.

So, if you are using the IdTaxa approach, replace taxa with taxid everywhere. For example, your last line needs to change to taxa.print <- taxid.

Alternatively you can just assign taxa <- taxid and then proceed as well.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/benjjneb/dada2/issues/658#issuecomment-455193923, or mute the thread https://github.com/notifications/unsubscribe-auth/AUwXrFXTCh5at5er-vPzbAomf3nFh0fWks5vEIrggaJpZM4aET48 .

Sewunet-Abera commented 5 years ago

Dear Ben I'm stuck again on two points.

  1. While making the phylogenetic tree using phangorn, I got an error message "can't assign vector of 6GB size file", after a week of running. So is there an alternative way to deal with such vector size and get my tree back

  2. In creating a phyloseq object and running the command I always end up with the following error

    ps <- phyloseq(otu_table(seqtab.nochim, taxa_are_rows=FALSE),

    • sample_data(sample.df),
    • tax_table(asv_taxa)) Error in validObject(.Object) : invalid class “phyloseq” object: Component taxa/OTU names do not match. Taxa indices are critical to analysis. Try taxa_names() In addition: Warning message: In .local(object) : Coercing from data.frame class to character matrix prior to building taxonomyTable. This could introduce artifacts. Check your taxonomyTable, or coerce to matrix manually.

And when I change the table to taxa_names ps <- phyloseq(otu_table(seqtab.nochim, taxa_are_rows=FALSE), sample_data(sample.df), taxa_names(asv_taxa)) ps phyloseq-class experiment-level object otu_table() OTU Table: [ 40009 taxa and 215 samples ] sample_data() Sample Data: [ 215 samples by 2 sample variables ]

And my trouble is I have 233 samples and all I see is only 215, how could I resolve this.

Though I'm learning sometimes I find myself lost in issues like this Thanks alot, hope you will guide me through.

benjjneb commented 5 years ago

For phyloseq, you'll want to check out some of the phyloseq documentation. For example on importing data: https://joey711.github.io/phyloseq/import-data.html

The error is telling you that you have different names for the "taxa" (ASVs in this case) in your sequence table and your taxonomy table. Did you create both with the dada2 package? Did you change those names at some point?

Check to see what they look like in each case with:

head(colnames(seqtab.nochim))
head(rownames(taxa)) # or taxid

The dada2 pipeline creates R objects in the right format to use in phyloseq. Perhaps you changed the format when you created the asv_taxa object.

On the tree, phangorn doesn't scale to large numbers of sequences. We recommend you try using RaxML instead. See some previous help on how to do in #88 especially comments by @giriarteS

Sewunet-Abera commented 5 years ago

Thanks Ben. I will look into those points.

On Wed, Jan 23, 2019, 6:23 PM Benjamin Callahan <notifications@github.com wrote:

For phyloseq, you'll want to check out some of the phyloseq documentation. For example on importing data: https://joey711.github.io/phyloseq/import-data.html

The error is telling you that you have different names for the "taxa" (ASVs in this case) in your sequence table and your taxonomy table. Did you create both with the dada2 package? Did you change those names at some point?

Check to see what they look like in each case with:

head(colnames(seqtab.nochim)) head(rownames(taxa)) # or taxid

The dada2 pipeline creates R objects in the right format to use in phyloseq. Perhaps you changed the format when you created the asv_taxa object.

On the tree, phangorn doesn't scale to large numbers of sequences. We recommend you try using RaxML instead. See some previous help on how to do in #88 https://github.com/benjjneb/dada2/issues/88 especially comments by @giriarteS https://github.com/giriarteS

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/benjjneb/dada2/issues/658#issuecomment-456842637, or mute the thread https://github.com/notifications/unsubscribe-auth/AUwXrGx9ftlOOd-hqo6kSoAFhTLolwm6ks5vGH5egaJpZM4aET48 .

WAEL1990 commented 4 years ago

Hello I followed the the DADA2 tutorial to carry out the assignment of the taxonomy (6400 seq) but I received this error.

ids <- IdTaxa(dna, trainingSet, strand="both", verbose=T) # use all processors

Error in getMethod(f, c("XRawList", "XRawList")) : no method found for function 'match' and signature XRawList, XRawList Please, are there any ideas to solve it.

benjjneb commented 4 years ago

@WAEL1990 What is your sessionInfo() when you encounter this error?

You could try loading the XVector library explicilty and see if that helps (i.e. library(XVector), as that is the library implementing the method that's not being found.

juanboja commented 4 years ago

@WAEL1990 What is your sessionInfo() when you encounter this error?

You could try loading the XVector library explicilty and see if that helps (i.e. library(XVector), as that is the library implementing the method that's not being found.

Hi, I am having the same issue. I have loaded many packages because I don't know what is missing.

> sessionInfo()
R version 3.6.3 (2020-02-29)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods  
[9] base     

other attached packages:
 [1] withr_2.2.0                 spatial_7.3-11              rex_1.2.0                  
 [4] MASS_7.3-51.5               KernSmooth_2.23-16          boot_1.3-24                
 [7] git2r_0.27.1                xml2_1.3.1                  tidyselect_1.0.0           
[10] sf_0.9-2                    scales_1.1.0                rlang_0.4.5                
[13] rematch2_2.1.1              RcppParallel_5.0.0          ps_1.3.2                   
[16] polspline_1.1.17            pkgbuild_1.0.6              pillar_1.4.3               
[19] nnet_7.3-13                 stringi_1.4.6               zoo_1.8-7                  
[22] units_0.6-6                 vcfR_1.10.0                 vctrs_0.2.4                
[25] tibble_3.0.1                statmod_1.4.34              seqinr_3.6-1               
[28] reshape2_1.4.4              RCurl_1.98-1.1              plotly_4.9.2.1             
[31] phangorn_2.5.5              microbiome_2.1.1            markdown_1.1               
[34] knitr_1.28                  installr_0.22.0             stringr_1.4.0              
[37] haven_2.2.0                 GenomeInfoDbData_1.2.2      generics_0.0.2             
[40] fs_1.4.1                    DBI_1.1.0                   curl_4.3                   
[43] class_7.3-17                ape_5.3                     base64enc_0.1-3            
[46] annotate_1.64.0             XML_3.99-0.3                genefilter_1.68.0          
[49] bit64_0.9-7                 bit_1.1-15.2                devtools_2.3.0             
[52] usethis_1.6.0               BiocManager_1.30.10         AnnotationDbi_1.48.0       
[55] zlibbioc_1.32.0             BiocVersion_3.10.1          acepack_1.4.1              
[58] data.table_1.12.8           adegenet_2.1.2              ade4_1.7-15                
[61] vegan_2.5-6                 lattice_0.20-41             permute_0.9-5              
[64] reshape_0.8.8               viridis_0.5.1               viridisLite_0.3.0          
[67] tidyr_1.0.2                 dendextend_1.13.4           decontam_1.6.0             
[70] DECIPHER_2.14.0             RSQLite_2.2.0               ShortRead_1.44.3           
[73] GenomicAlignments_1.22.1    Rsamtools_2.2.3             ggplot2_3.3.0              
[76] dada2_1.14.1                Rcpp_1.0.4.6                phyloseq_1.30.0            
[79] Biostrings_2.54.0           XVector_0.26.0              DESeq2_1.26.0              
[82] SummarizedExperiment_1.16.1 DelayedArray_0.12.3         BiocParallel_1.20.1        
[85] matrixStats_0.56.0          Biobase_2.46.0              GenomicRanges_1.38.0       
[88] GenomeInfoDb_1.22.1         IRanges_2.20.2              S4Vectors_0.24.4           
[91] BiocGenerics_0.32.0        

loaded via a namespace (and not attached):
 [1] htmlwidgets_1.5.1   grid_3.6.3          Rtsne_0.15          munsell_0.5.0      
 [5] codetools_0.2-16    colorspace_1.4-1    rstudioapi_0.11     labeling_0.3       
 [9] hwriter_1.3.2       farver_2.0.3        rhdf5_2.30.1        rprojroot_1.3-2    
[13] coda_0.19-3         LearnBayes_2.15.1   xfun_0.13           R6_2.4.1           
[17] locfit_1.5-9.4      bitops_1.0-6        assertthat_0.2.1    promises_1.1.0     
[21] pinfsc50_1.1.0      gtable_0.3.0        processx_3.4.2      splines_3.6.3      
[25] lazyeval_0.2.2      checkmate_2.0.0     backports_1.1.6     httpuv_1.5.2       
[29] Hmisc_4.4-0         tools_3.6.3         spData_0.3.5        ellipsis_0.3.0     
[33] raster_3.1-5        biomformat_1.14.0   RColorBrewer_1.1-2  sessioninfo_1.1.1  
[37] plyr_1.8.6          classInt_0.4-3      purrr_0.3.4         prettyunits_1.1.1  
[41] rpart_4.1-15        deldir_0.1-25       cluster_2.1.0       magrittr_1.5       
[45] gmodels_2.18.1      pkgload_1.0.2       hms_0.5.3           mime_0.9           
[49] xtable_1.8-4        jpeg_0.1-8.1        gridExtra_2.3       testthat_2.3.2     
[53] compiler_3.6.3      crayon_1.3.4        htmltools_0.4.0     mgcv_1.8-31        
[57] later_1.0.0         spdep_1.1-3         Formula_1.2-3       geneplotter_1.64.0 
[61] expm_0.999-4        Matrix_1.2-18       cli_2.0.2           quadprog_1.5-8     
[65] gdata_2.18.0        igraph_1.2.5        forcats_0.5.0       pkgconfig_2.0.3    
[69] foreign_0.8-76      sp_1.4-1            foreach_1.5.0       multtest_2.42.0    
[73] callr_3.4.3         digest_0.6.25       fastmatch_1.1-0     htmlTable_1.13.3   
[77] shiny_1.4.0.2       gtools_3.8.2        lifecycle_0.2.0     nlme_3.1-147       
[81] jsonlite_1.6.1      Rhdf5lib_1.8.0      desc_1.2.0          fansi_0.4.1        
[85] httr_1.4.1          fastmap_1.0.1       survival_3.1-12     glue_1.4.0         
[89] remotes_2.1.1       png_0.1-7           iterators_1.0.12    blob_1.2.1         
[93] latticeExtra_0.6-29 memoise_1.1.0       dplyr_0.8.5         e1071_1.7-3  

I used this code 10 days ago and it worked, when I tried it again, it didn't work.

juanboja commented 4 years ago

I don't know what was happening, but as always, a good reboot solved the problem.. It is working again. Thanks

Sebastian-Mynott commented 4 years ago

I'm having the same issue as @WAEL1990 and @juanboja, but rebooting has not solved my issue. I originally posted this in Issue #950 : https://github.com/benjjneb/dada2/issues/950#issuecomment-682436898

I've noticed that I can get the IdTaxa function to run if I restart the R session, load the DECIPHER library and run it before loading other libraries. However, even this fails if the last bits of the restart (see below) manage to run before I can run the function.

Registered S3 method overwritten by 'spdep':
  method   from
  plot.mst ape 
Registered S3 method overwritten by 'pegas':
  method      from
  print.amova ade4

I've also tried loading the XVector library but to no effect.

@WAEL1990 What is your sessionInfo() when you encounter this error?

You could try loading the XVector library explicilty and see if that helps (i.e. library(XVector), as that is the library implementing the method that's not being found.

Any assistance would be very much appreciated.

benjjneb commented 4 years ago

@digitalwright Any thoughts on the errors being seen in this thread when using IdTaxa?

Sebastian-Mynott commented 4 years ago

To be fair, this looks increasingly like a Bioconductor / DECIPHER issue. I've posted the issue on the Bioconductor community page here when I noticed that the issue would also occur even when running the Examples given in the IdTaxa function R Documentation.

FYI: I've tested this issue using R v4.0.2 and DECIPHER v2.16.1 as well as with R v3.6.3 and DECIPHER v2.12.0. I can provide sessionInfo for both, if that helps.

digitalwright commented 4 years ago

I have never experienced this issue, nor heard it reported outside this thread. So my guess is that there is a version mismatch between R, Biostrings, and DECIPHER. Note that DECIPHER has to run all the examples on the Bioconductor build servers to stay active. This suggest it is something about the installation on a specific machine. Sorry I cannot be of more help.

Sebastian-Mynott commented 4 years ago

That makes sense but it used to run fine for me in the past. In fact, the workaround that I cited this morning, restarting R to get it to run has now stopped working so I'm quite stuck, it would seem.

Obviously this is a central part of my workflow. Is there anything you might suggest to try to resolve this issue? How could this now not work and the error message reference what would appear to be a such core function?