adw96 / breakaway

Species richness with high diversity
68 stars 18 forks source link

Error message while using breakaway() #106

Closed jehaab closed 3 years ago

jehaab commented 3 years ago

Hello, I’m having problems using the breakaway function on my data. I had contacted you for a similar problem which you had resolved. https://github.com/adw96/breakaway/issues/92

At the time I was working on data at the family level. Now I am working at the species level and have a similar problem (I am joining the frequency count table that produces the error). Using the breakaway() function and it always stops at one sample and returns an error: "In poisson_model(input_data, cutoff = cutoff) : Cut-off was too low: no data available for estimation" I checked the model that was used by breakaway on my samples and it is always Poisson. Even with this error it returns an estimate and when I count the number of species detected in each of my samples it is the same as what is returned by breakaway(), do you have an idea as to why this happens?

Furthermore, I tried to use the chao1() function it works, but just like breakaway it returns the number of species detected in each of my samples . FrequencyCountTableError_species.xlsx

Thank you for your help

R.version _
platform x86_64-apple-darwin15.6.0
arch x86_64
os darwin15.6.0
system x86_64, darwin15.6.0
status
major 3
minor 6.3
year 2020
month 02
day 29
svn rev 77875
language R
version.string R version 3.6.3 (2020-02-29) nickname Holding the Windsock
sessionInfo() R version 3.6.3 (2020-02-29) Platform: x86_64-apple-darwin15.6.0 (64-bit) Running under: macOS Sierra 10.12.6

Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

locale: [1] fr_FR.UTF-8/fr_FR.UTF-8/fr_FR.UTF-8/C/fr_FR.UTF-8/fr_FR.UTF-8

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] reshape2_1.4.4 ggpubr_0.4.0 vegan_2.5-7 lattice_0.20-41 permute_0.9-5 forcats_0.5.1 ggplot2_3.3.3 breakaway_4.7.3

loaded via a namespace (and not attached): [1] nlme_3.1-152 fs_1.5.0 usethis_2.0.1 phyloseq_1.30.0 devtools_2.3.2 progress_1.2.2
[7] httr_1.4.2 rprojroot_2.0.2 tools_3.6.3 backports_1.2.1 R6_2.5.0 DBI_1.1.1
[13] lazyeval_0.2.2 BiocGenerics_0.32.0 mgcv_1.8-33 colorspace_2.0-0 ade4_1.7-16 withr_2.4.1
[19] processx_3.4.5 tidyselect_1.1.0 prettyunits_1.1.1 curl_4.3 compiler_3.6.3 cli_2.3.1
[25] Biobase_2.46.0 desc_1.3.0 plotly_4.9.3 scales_1.1.1 callr_3.5.1 stringr_1.4.0
[31] digest_0.6.27 foreign_0.8-75 rio_0.5.16 XVector_0.26.0 pkgconfig_2.0.3 htmltools_0.5.1.1
[37] sessioninfo_1.1.1 fastmap_1.1.0 htmlwidgets_1.5.3 rlang_0.4.10 readxl_1.3.1 rstudioapi_0.13
[43] generics_0.1.0 jsonlite_1.7.2 dplyr_1.0.5 zip_2.1.1 car_3.0-10 magrittr_2.0.1
[49] biomformat_1.14.0 Matrix_1.3-2 Rcpp_1.0.6 munsell_0.5.0 S4Vectors_0.24.4 Rhdf5lib_1.8.0
[55] ape_5.4-1 abind_1.4-5 lifecycle_1.0.0 stringi_1.5.3 carData_3.0-4 MASS_7.3-53.1
[61] zlibbioc_1.32.0 rhdf5_2.30.1 pkgbuild_1.2.0 plyr_1.8.6 grid_3.6.3 parallel_3.6.3
[67] crayon_1.4.1 Biostrings_2.54.0 haven_2.3.1 cowplot_1.1.1 splines_3.6.3 multtest_2.42.0
[73] hms_1.0.0 ps_1.6.0 pillar_1.5.1 igraph_1.2.6 ggsignif_0.6.0 pkgload_1.2.0
[79] codetools_0.2-18 stats4_3.6.3 glue_1.4.2 remotes_2.2.0 data.table_1.13.6 vctrs_0.3.6
[85] foreach_1.5.1 testthat_3.0.2 cellranger_1.1.0 gtable_0.3.0 purrr_0.3.4 tidyr_1.1.2
[91] assertthat_0.2.1 cachem_1.0.4 openxlsx_4.2.3 broom_0.7.4 rstatix_0.7.0 survival_3.2-7
[97] viridisLite_0.3.0 tibble_3.1.0 iterators_1.0.13 memoise_2.0.0 IRanges_2.20.2 cluster_2.1.1
[103] ellipsis_0.3.1

adw96 commented 3 years ago

Thanks for your question and for sharing the data! Based on eyeballing your data, it looks like you have very few rare species -- breakaway is essentially estimating that you don't have any rare taxa. You can get around this "issue" by giving the cutoff = Inf argument to breakaway, I think. If you give that a shot and run into trouble let us know :)

Amy

jehaab commented 3 years ago

Hello Amy,

Thank you for your answer, I feel like I am missing something, I think I am not running breakaway like I should be.

Could you tell me if I used breakaway the right way ?

First I generated a data frame of species counts, with columns as samples and rows as species as follows sp_break = t(data_count_species)

Then, I generated a frequency count table as follows It returns a list object (which I joined in previous message) sp_freq = build_frequency_count_tables(sp_break)

Then I tried to run b_sp = breakaway(sp_freq) and I was never able to make it work Because it returns "Error in convert(input_data) : Unsupported input type to function convert. You passed in an object of class list"

So I ran : sp = convert(t(data_count_species)) Because in the breakaway documentation "input_data | An input type that can be processed by convert()" which returns Error in convert(t(data_count_species)) : input_data is a data.frame or matrix but not a frequency count table.

So I tried to run sp = convert(sp_freq) Because sp_freq is a frequency count table But I have this: "Error in convert(sp_freq) : Unsupported input type to function convert. You passed in an object of class list"

I ended up running:

vect_s <- c()
for (i in 1:length(colnames(sp_break))) {
  vect_s <- c(vect_s, breakway(sp_freq[[colnames(sp_break)[i]]])$estimate)
} 

Which returns "In poisson_model(input_data, cutoff = cutoff) : Cut-off was too low: no data available for estimation"

and I am not able to integrate cutoff = Inf in that for loop (I tried multiple things but I am pretty new at R) I

but if I run

vect_s2 <- c()
for (i in 1:length(colnames(sp_chao))) {
  vect_s2 <- c(vect_s2, chao1(sp_freq[[colnames(sp_chao)[i]]])$estimate)
}
sp_chao is the same as sp_break

I works but as I said it returns the exact number that I get when I just manually count the number of species I have in my samples

I hope this is clear.

I really feel like I am missing something very important to run breakaway correctly

Edit: I have tried to run

vect_s <- c()
for (i in 1:length(colnames(sp_break))) {
  vect_s <- c(vect_s, breakway(sp_freq[[colnames(sp_break)[i]]])$estimate), cutoff = Inf))
} 

it works but returns "In poisson_model(input_data, cutoff = cutoff) : Cut-off was too low: no data available for estimation" for 50 samples for the rest of the 456 samples it returns also exactly the same number as when I manually count runing: n_species = rowSums(df_species !=0)

Sorry if that's a lot

jehaab commented 3 years ago

Hello Amy,

Sorry if I asked too much of you in my earlier post.

What you said seamed to have worked.

The only concern I have left is what I talked about earlier. I don't understand why breakaway() and chao1() returns exactly the same number as xhen I count my species manually n_species = rowSums(df_species !=0)

Thank you in advance and again sorry for the inconvenience

Jehane

adw96 commented 3 years ago

Not a problem at all -- thanks for giving it a go and great work figuring it out! Given that you have so few rare taxa, any sensible method is going to return the number of observed taxa as its best estimate of total taxa -- it seems like there is not a lot of diversity that you didn't observe. That's the reason why breakaway, chao1 and sample_richness are all giving you basically the same number.

Thanks for using breakaway -- I hope that helps!

jehaab commented 3 years ago

@adw96 That's such a relief thank you :)