adw96 / breakaway

Species richness with high diversity
68 stars 18 forks source link

Error and warning message when using breakaway() and betta() #149

Closed ivanllampy closed 2 years ago

ivanllampy commented 2 years ago

Hi all,

I am working on comparing alpha diversity of several gut type samples, recently discover the breakaway package! I gone through several vignettes and tutorial (https://github.com/adw96/stamps2018/blob/master/estimation/diversity-lab.R), and I am on my way to try it with my dataset.

I encounter several difficulties when using breakaway. My R background is too weak to solve them, may I seek some advice here?

  1. When running breakaway() on a phyloseq object, warning message below shown: Warning messages: 1: In poisson_model(input_data, cutoff = cutoff) : Cut-off was too low: no data available for estimation

  2. When using the function chao_bunge(), an error occurs: Error in plot.alpha_estimates(., gut, color = "type") : There are no estimates in this alpha_estimates object!

  3. Finally, when using the function betta(), I encounter the following error: Error in solve.default(R) : system is computationally singular: reciprocal condition number = 1.48178e-173

It looks like there are multiple problems of my dataset, rather than error from the packages. Are there any ways to solve it? Any advice are highly appreciated!!!

Here attached the output of R as suggested in the instruction post:

library(breakaway) library(phyloseq) library(magrittr) library(tidyverse) R.version _
platform x86_64-w64-mingw32
arch x86_64
os mingw32
system x86_64, mingw32
status
major 4
minor 1.2
year 2021
month 11
day 01
svn rev 81115
language R
version.string R version 4.1.2 (2021-11-01) nickname Bird Hippie
sessionInfo() R version 4.1.2 (2021-11-01) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale: [1] LC_COLLATE=English_Hong Kong SAR.1252 LC_CTYPE=English_Hong Kong SAR.1252
[3] LC_MONETARY=English_Hong Kong SAR.1252 LC_NUMERIC=C
[5] LC_TIME=English_Hong Kong SAR.1252
system code page: 950

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] forcats_0.5.1 stringr_1.4.0 purrr_0.3.4 readr_2.1.2 tidyr_1.2.0
[6] tidyverse_1.3.1 foreach_1.5.2 DivNet_0.4.0 dplyr_1.0.8 breakaway_4.7.6
[11] tibble_3.1.6 magrittr_2.0.2 ggplot2_3.3.5.9000 phyloseq_1.38.0

loaded via a namespace (and not attached): [1] minqa_1.2.4 colorspace_2.0-3 ellipsis_0.3.2 rprojroot_2.0.2
[5] XVector_0.34.0 fs_1.5.2 rstudioapi_0.13 farver_2.1.0
[9] remotes_2.4.2 fansi_1.0.2 lubridate_1.8.0 xml2_1.3.3
[13] codetools_0.2-18 splines_4.1.2 doParallel_1.0.17 cachem_1.0.6
[17] pkgload_1.2.4 ade4_1.7-18 jsonlite_1.8.0 nloptr_2.0.0
[21] broom_0.7.12 cluster_2.1.2 dbplyr_2.1.1 compiler_4.1.2
[25] httr_1.4.2 backports_1.4.1 assertthat_0.2.1 Matrix_1.3-4
[29] fastmap_1.1.0 cli_3.1.1 prettyunits_1.1.1 tools_4.1.2
[33] igraph_1.2.11 gtable_0.3.0 glue_1.6.1 GenomeInfoDbData_1.2.7 [37] reshape2_1.4.4 Rcpp_1.0.8 Biobase_2.54.0 cellranger_1.1.0
[41] vctrs_0.3.8 Biostrings_2.62.0 rhdf5filters_1.6.0 multtest_2.50.0
[45] ape_5.6-2 nlme_3.1-153 iterators_1.0.14 ps_1.6.0
[49] brio_1.1.3 rvest_1.0.2 openxlsx_4.2.5 testthat_3.1.2
[53] lme4_1.1-28 lifecycle_1.0.1 devtools_2.4.3 zlibbioc_1.40.0
[57] MASS_7.3-54 scales_1.1.1 hms_1.1.1 parallel_4.1.2
[61] biomformat_1.22.0 rhdf5_2.38.0 mvnfast_0.2.7 memoise_2.0.1
[65] stringi_1.7.6 S4Vectors_0.32.3 desc_1.4.1 permute_0.9-7
[69] BiocGenerics_0.40.0 boot_1.3-28 pkgbuild_1.3.1 zip_2.2.0
[73] GenomeInfoDb_1.30.1 rlang_1.0.1 pkgconfig_2.0.3 bitops_1.0-7
[77] lattice_0.20-45 Rhdf5lib_1.16.0 labeling_0.4.2 processx_3.5.2
[81] tidyselect_1.1.2 plyr_1.8.6 R6_2.5.1 IRanges_2.28.0
[85] generics_0.1.2 DBI_1.1.2 pillar_1.7.0 haven_2.4.3
[89] withr_2.5.0 mgcv_1.8-38 survival_3.2-13 abind_1.4-5
[93] RCurl_1.98-1.6 modelr_0.1.8 crayon_1.5.0 utf8_1.2.2
[97] tzdb_0.2.0 usethis_2.1.5 readxl_1.3.1 grid_4.1.2
[101] data.table_1.14.2 callr_3.7.0 vegan_2.5-7 reprex_2.0.1
[105] digest_0.6.29 stats4_4.1.2 munsell_0.5.0 sessioninfo_1.2.2

ailurophilia commented 2 years ago

Hi,

Thanks for your question. It's hard to say exactly what is happening in each of these instances without seeing what code you are attempting to run and having some information about the dataset you are using. With that said, here are a few thoughts:

  1. The first error may be occurring because you need to specify a higher value of the 'cutoff' argument of 'breakaway' – this gives the maximum frequency count used in breakaway model fitting. If the cutoff is set too low, all data will be excluded and you will have nothing left to fit your model with.
  2. The second error I don't think I can help with without knowing more about what input you are giving chao_bunge().
  3. Again, hard to say without seeing your code/data what is causing the error in betta() here, but one issue that can cause this problem is if the model you are trying to fit has too many parameters in it. (Specifically, if you are familiar with design matrices, this can happen if your design matrix does not have full column rank, but generally speaking the problem is that not all of your model parameters can be uniquely estimated from your data.)

I hope this is somewhat helpful – happy to follow up if you have more questions!

Best, David

ivanllampy commented 2 years ago

Hi all,

Thank you for your reply, and sorry for my late response. I have been trying to manipulate the code these days with different parameters, and able to get rid of the error. Although not sure what happened at the beginning, it generally works now for breakaway() and betta()!

For chao_bunge(), another outcome occurs - not every sample has an estimation, for example: $Sample LI04 Estimate of richness from method Chao-Bunge: Estimate is NA with standard error NA Confidence interval: (NA, NA) Cutoff: 10 Here is my workflow: I create a phyloseq object with ASV table, metadata file, and taxonomy table. Then, I subset the phyloseq with the groups of sample I interested. Finally, I simply run chao_bunge() to the phyloseq object. Here is the code I used: physeq<-phyloseq(asv.table,taxa.table,sam.table) #Create an phyloseq object physeq_rm900 <- phyloseq::subset_samples(physeq, phyloseq::sample_sums(physeq) > 900) #Remove samples with number of read count < 900 fg_physeq <- subset_samples(physeq_rm900, type %in% c("f","si","li")) #Subset three groups, namely "f", "si", and "li" cb<-chao_bunge(fg_physeq, cutoff = 10, output = NULL, answers = NULL) #Run chao_bunge()

May I ask why would the estimate be "NA"? Should I switch to another type of estimates, or it is a problem of my dataset? Many thanks for your reply again, any help would be appreciated!

ailurophilia commented 2 years ago

Hi,

Sorry for the long delay! chao_bunge can return NA values if estimated values of diversity are negative or if they are undefined (which can happen if a computed denominator term is zero). In each of these cases, you should receive a warning message – have you seen any messages warning you about output of this function?

Thanks!

David

ailurophilia commented 2 years ago

You may also have better luck with the breakaway function than with chao_bungebreakaway will fit a different model if the initial model it tries to fit is invalid or gives nonsensical results, so you are more likely to get usable output.

adw96 commented 2 years ago

I will second the excellent answer of @ailurophilia here -- the Chao-Bunge estimator is based on a negative binomial model, which can be very unstable and produce negative estimates of the number of unobserved species (!), especially when latent diversity is high. This is part of the motivation for writing the function breakaway, which sequentially steps through models as more complex models fail.

As this isn't a code bug, I'm going to close this issue, @ivanllampy. We will do our best to help if you have more questions.

jorondo1 commented 2 years ago

Hello @adw96 and @ailurophilia, what cutoff values you would recommend, or how to know which to use?

I am going through the stamps2022 diversity lab and I am also getting warnings (50 of them) that my cutoff is too low. It seems the default value is NA, so I'm not sure where to start. I tried supplying Inf, but the errors remain. Of particular interest, the plot grid that compares the observed phyloseq with the ba look exactly the same (except for the y axis minimum) : Rplot

Here's the code I ran :

psObject <- psProvSM
observed_phyloseq <- sample_richness(psObject)
plot(observed_phyloseq, psObject, color = "compart")
ba <- breakaway(psObject, cutoff=Inf)
p1 <- plot(ba, psObject, color = "compart")
p2 <- plot(observed_phyloseq, psObject, color = "compart")
grid.arrange(p2, p1, nrow = 2)

Edit: sequencing depth are quite different between my two compartments, so I would expect there to be a difference when estimating using breakaway: Rplot01

Thanks in advance for your help !

jorondo1 commented 2 years ago

Sorry, first time here!

R version 4.2.1 (2022-06-23)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Mojave 10.14.6

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale:
[1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] gridExtra_2.3               janitor_2.1.0               corncob_0.3.0              
 [4] wesanderson_0.3.6           RColorBrewer_1.1-3          DESeq2_1.36.0              
 [7] SummarizedExperiment_1.26.1 Biobase_2.56.0              MatrixGenerics_1.8.1       
[10] matrixStats_0.62.0          GenomicRanges_1.48.0        GenomeInfoDb_1.32.4        
[13] IRanges_2.30.1              S4Vectors_0.34.0            DivNet_0.4.0               
[16] breakaway_4.7.9             BiocGenerics_0.42.0         vegan_2.6-2                
[19] lattice_0.20-45             permute_0.9-7               lubridate_1.8.0            
[22] trust_0.1-8                 forcats_0.5.2               dplyr_1.0.10               
[25] purrr_0.3.5                 ggplot2_3.3.6               tidyverse_1.3.2            
[28] tidyr_1.2.1.9000            stringr_1.4.1               readr_2.1.3                
[31] tibble_3.1.8                optimr_2019-12.16           magrittr_2.0.3             
[34] kableExtra_1.3.4            data.table_1.14.2           phyloseq_1.40.0            

loaded via a namespace (and not attached):
  [1] utf8_1.2.2               ROI.plugin.lpsolve_1.0-1 tidyselect_1.1.2         lme4_1.1-30             
  [5] htmlwidgets_1.5.4        RSQLite_2.2.18           AnnotationDbi_1.58.0     grid_4.2.1              
  [9] BiocParallel_1.30.3      Rtsne_0.16               munsell_0.5.0            codetools_0.2-18        
 [13] miniUI_0.1.1.1           withr_2.5.0              colorspace_2.0-3         lpSolveAPI_5.5.2.0-17.8 
 [17] knitr_1.40               rstudioapi_0.14          setRNG_2022.4-1          labeling_0.4.2          
 [21] slam_0.1-50              GenomeInfoDbData_1.2.8   bit64_4.0.5              farver_2.1.1            
 [25] rhdf5_2.40.0             rprojroot_2.0.3          vctrs_0.4.2              generics_0.1.3          
 [29] xfun_0.33                R6_2.5.1                 doParallel_1.0.17        VGAM_1.1-7              
 [33] locfit_1.5-9.6           bitops_1.0-7             rhdf5filters_1.8.0       microbiome_1.18.0       
 [37] cachem_1.0.6             DelayedArray_0.22.0      assertthat_0.2.1         promises_1.2.0.1        
 [41] scales_1.2.1             vroom_1.6.0              googlesheets4_1.0.1      gtable_0.3.1            
 [45] processx_3.7.0           rlang_1.0.6              genefilter_1.78.0        systemfonts_1.0.4       
 [49] splines_4.2.1            gargle_1.2.1             Rcgmin_2022-4.30         broom_1.0.1             
 [53] BiocManager_1.30.18      reshape2_1.4.4           abind_1.4-5              modelr_0.1.9            
 [57] backports_1.4.1          httpuv_1.6.6             usethis_2.1.6            optextras_2019-12.4     
 [61] tools_4.2.1              ellipsis_0.3.2           biomformat_1.24.0        sessioninfo_1.2.2       
 [65] Rcpp_1.0.9               plyr_1.8.7               zlibbioc_1.42.0          RCurl_1.98-1.9          
 [69] prettyunits_1.1.1        ps_1.7.1                 urlchecker_1.0.1         ROI_1.0-0               
 [73] haven_2.5.1              cluster_2.1.4            fs_1.5.2                 Rvmmin_2018-4.17.1      
 [77] reprex_2.0.2             mvnfast_0.2.7            googledrive_2.0.0        pkgload_1.3.0           
 [81] mime_0.12                hms_1.1.2                evaluate_0.17            xtable_1.8-4            
 [85] XML_3.99-0.11            readxl_1.4.1             compiler_4.2.1           crayon_1.5.2            
 [89] minqa_1.2.4              htmltools_0.5.3          later_1.3.0              mgcv_1.8-40             
 [93] tzdb_0.3.0               geneplotter_1.74.0       DBI_1.1.3                dbplyr_2.2.1            
 [97] MASS_7.3-58.1            boot_1.3-28              Matrix_1.5-1             ade4_1.7-19             
[101] cli_3.4.1                parallel_4.2.1           igraph_1.3.5             pkgconfig_2.0.3         
[105] registry_0.5-1           numDeriv_2016.8-1.1      xml2_1.3.3               foreach_1.5.2           
[109] svglite_2.1.0            annotate_1.74.0          multtest_2.52.0          webshot_0.5.4           
[113] XVector_0.36.0           rvest_1.0.3              snakecase_0.11.0         callr_3.7.2             
[117] detectseparation_0.3     digest_0.6.29            Biostrings_2.64.1        rmarkdown_2.16          
[121] cellranger_1.1.0         curl_4.3.3               shiny_1.7.2              nloptr_2.0.3            
[125] lifecycle_1.0.3          nlme_3.1-159             jsonlite_1.8.2           Rhdf5lib_1.18.2         
[129] viridisLite_0.4.1        fansi_1.0.3              pillar_1.8.1             pkgbuild_1.3.1          
[133] KEGGREST_1.36.3          fastmap_1.1.0            httr_1.4.4               survival_3.4-0          
[137] remotes_2.4.2            glue_1.6.2               png_0.1-7                iterators_1.0.14        
[141] bit_4.0.4                profvis_0.3.7            stringi_1.7.8            blob_1.2.3              
[145] memoise_2.0.1            ape_5.6-2