adw96 / breakaway

Species richness with high diversity
68 stars 18 forks source link

Betta: Standard errors same for almost all factor levels #160

Closed naurasd closed 1 year ago

naurasd commented 2 years ago

Hi,

here the session info:

library(breakaway) library(phyloseq) library(magrittr) Warning message: package ‘magrittr’ was built under R version 4.1.3 library(tidyverse) -- Attaching packages ---------------------------------------------- tidyverse 1.3.1 -- v ggplot2 3.3.6 v purrr 0.3.4 v tibble 3.1.8 v dplyr 1.0.9 v tidyr 1.2.0 v stringr 1.4.1 v readr 2.1.2 v forcats 0.5.1 -- Conflicts ------------------------------------------------- tidyverse_conflicts() -- x tidyr::extract() masks magrittr::extract() x dplyr::filter() masks stats::filter() x dplyr::lag() masks stats::lag() x purrr::set_names() masks magrittr::setnames() Warning messages: 1: package ‘ggplot2’ was built under R version 4.1.3 2: package ‘tibble’ was built under R version 4.1.3 3: package ‘tidyr’ was built under R version 4.1.3 4: package ‘readr’ was built under R version 4.1.3 5: package ‘dplyr’ was built under R version 4.1.3 6: package ‘stringr’ was built under R version 4.1.3 R.version
platform x86_64-w64-mingw32
arch x86_64
os mingw32
system x86_64, mingw32
status
major 4
minor 1.0
year 2021
month 05
day 18
svn rev 80317
language R
version.string R version 4.1.0 (2021-05-18) nickname Camp Pontanezen
sessionInfo() R version 4.1.0 (2021-05-18) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19044) Matrix products: default locale: [1] LC_COLLATE=English_Germany.1252 LC_CTYPE=English_Germany.1252
[3] LC_MONETARY=English_Germany.1252 LC_NUMERIC=C
[5] LC_TIME=English_Germany.1252
attached base packages: [1] stats graphics grDevices utils datasets methods base
other attached packages: [1] forcats_0.5.1 stringr_1.4.1 dplyr_1.0.9 purrr_0.3.4 readr_2.1.2
[6] tidyr_1.2.0 tibble_3.1.8 ggplot2_3.3.6 tidyverse_1.3.1 magrittr_2.0.3 [11] phyloseq_1.36.0 breakaway_4.7.6 loaded via a namespace (and not attached): [1] nlme_3.1-152 bitops_1.0-7 fs_1.5.2
[4] usethis_2.1.6 lubridate_1.8.0 devtools_2.4.3
[7] httr_1.4.3 rprojroot_2.0.3 GenomeInfoDb_1.28.4
[10] backports_1.4.1 tools_4.1.0 utf8_1.2.2
[13] R6_2.5.1 vegan_2.6-2 DBI_1.1.2
[16] BiocGenerics_0.38.0 mgcv_1.8-40 colorspace_2.0-3
[19] permute_0.9-7 rhdf5filters_1.4.0 ade4_1.7-19
[22] withr_2.5.0 tidyselect_1.1.2 prettyunits_1.1.1
[25] processx_3.7.0 curl_4.3.2 compiler_4.1.0
[28] rvest_1.0.2 cli_3.3.0 Biobase_2.52.0
[31] xml2_1.3.3 scales_1.2.1 callr_3.7.2
[34] minqa_1.2.4 XVector_0.32.0 pkgconfig_2.0.3
[37] lme4_1.1-30 sessioninfo_1.2.2 dbplyr_2.2.0
[40] fastmap_1.1.0 readxl_1.4.0 rlang_1.0.4
[43] rstudioapi_0.13 generics_0.1.3 jsonlite_1.8.0
[46] RCurl_1.98-1.6 GenomeInfoDbData_1.2.6 biomformat_1.20.0
[49] Matrix_1.4-1 Rcpp_1.0.9 munsell_0.5.0
[52] S4Vectors_0.30.2 Rhdf5lib_1.14.2 fansi_1.0.3
[55] ape_5.6-2 lifecycle_1.0.1 stringi_1.7.8
[58] MASS_7.3-57 zlibbioc_1.38.0 rhdf5_2.36.0
[61] pkgbuild_1.3.1 plyr_1.8.7 grid_4.1.0
[64] parallel_4.1.0 crayon_1.5.1 lattice_0.20-44
[67] haven_2.5.0 Biostrings_2.60.2 splines_4.1.0
[70] multtest_2.48.0 hms_1.1.1 ps_1.7.0
[73] pillar_1.8.1 igraph_1.3.1 boot_1.3-28
[76] reshape2_1.4.4 codetools_0.2-18 stats4_4.1.0
[79] pkgload_1.3.0 reprex_2.0.1 glue_1.6.2
[82] modelr_0.1.8 data.table_1.14.2 remotes_2.4.2
[85] tzdb_0.3.0 vctrs_0.4.1 nloptr_2.0.3
[88] foreach_1.5.2 cellranger_1.1.0 gtable_0.3.0
[91] assertthat_0.2.1 cachem_1.0.6 broom_0.8.0
[94] survival_3.2-11 iterators_1.0.14 memoise_2.0.1
[97] IRanges_2.26.0 cluster_2.1.2 ellipsis_0.3.2

I am analyzing a 16S amplicon dataset with breakaway / betta and have a question regarding a pattern in standard errors I am seeing.

I have run the regular breakaway function to estimate richness sample-wise:

library(breakaway) library(phyloseq) setwd("~/diversity") ps16S<-readRDS("ps16S.rds") set.seed(1) richness16S<-breakaway(ps16S)

I then fit a model with Reef_Period as fixed effect to get estimates not for samples, but for the factor Reef_Period (I have samples for 10 reefs sampled in two consecutive periods). Basically following this vignette :

meta16S <- ps16S %>% sample_data %>% as_tibble %>% mutate("sample_names" = ps16S %>% sample_names)

estimates16S <- meta16S %>% left_join(summary(richness16S), by = "sample_names")

betta_16S <- betta(formula = estimate ~ Reef_Period -1, ses = error, data = estimates16S)

rich_16S<-betta_16S$table[,1:2] rownames(rich_16S)<-gsub("Reef_Period","",rownames(rich_16S))

write.table(rich_16S,"16S_Reef_Period_breakaway_richness.txt",sep="\t",col.names = NA)

The following table shows the estimates and standard errors for the factor levels:

<html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

  | Estimates | Standard Errors -- | -- | -- Abu Shosha_2016/17 | 3552.279 | 1206.308 Abu Shosha_2018/19 | 5738.984 | 1313.144 Al Fahal_2016/17 | 4568.14 | 1206.423 Al Fahal_2018/19 | 6738.006 | 1206.177 Duba 07_2016/17 | 10685.68 | 1205.904 Duba 07_2018/19 | 5559.989 | 1206.051 Duba 12_2016/17 | 6085.879 | 1205.673 Duba 12_2018/19 | 4623.98 | 1206.87 Jeddah 01_2016/17 | 4790.02 | 1206.161 Jeddah 01_2018/19 | 8086.992 | 1205.846 Jeddah 02_2016/17 | 5317.807 | 1206.02 Jeddah 02_2018/19 | 6071.996 | 1476.96 Jeddah 03_2016/17 | 3809.166 | 1205.747 Jeddah 03_2018/19 | 6809.603 | 1205.922 North Al Lith_2016/17 | 3351.649 | 1205.747 North Al Lith_2018/19 | 2449.813 | 1205.648 South Al Lith_2016/17 | 4697.136 | 1210.427 South Al Lith_2018/19 | 7435.654 | 1205.776 Whale Shark_2016/17 | 3768.343 | 1205.765 Whale Shark_2018/19 | 5902.982 | 1205.927 Except for some exceptions, the standard errors are almost identical. I was wondering why this is the case? Why are the errors so similar? I have the feeling that this is something a reviewer would pick up and question. I see exactly the same behavior with a metabarcoding set using the COI marker, by the way. I do have removed OTUs present as singletons in the overall dataset, but alowing singletons to be present in samples if their total abudnance in the dataset is >1. Below is the sample-wise data output of breakaway. Most of the time, a negative binominal models was chosen. estimate | error | lower | upper | sample_names | name | model -- | -- | -- | -- | -- | -- | -- 5830.785 | 80.15248 | 5187.525 | 18816.4 | AFHL.ARMS1.Sessile16S_AFHL.ARMS1.Sessile16S_N710.S517 | breakaway | Negative Binomial 7138.705 | 50.92133 | 6692.187 | 13593.03 | AFHL.ARMS2.Sessile16S_AFHL.ARMS2.Sessile16S_N711.S517 | breakaway | Negative Binomial 7244.526 | 67.2448 | 6655.347 | 17384.19 | AFHL.ARMS3.Sessile16S_AFHL.ARMS3.Sessile16S_N712.S517 | breakaway | Negative Binomial 7051.284 | 30.13974 | 6801.808 | 9712.381 | AL3.ARMS1.Sessile16S_AL3.ARMS1.Sessile16S_N702.S517 | breakaway | Negative Binomial 7862.156 | 55.35578 | 7413.852 | 15042.84 | AL3.ARMS2.Sessile16S_AL3.ARMS2.Sessile16S_N701.S517 | breakaway | Negative Binomial 7393.523 | 29.48224 | 7159.806 | 9921.82 | AL3.ARMS3.Sessile16S_AL3.ARMS3.Sessile16S_N703.S517 | breakaway | Negative Binomial 1999.757 | 5.561392 | 1986.837 | 2103.546 | ALR5.ARMS1.Sessile16S_ALR5.ARMS1.Sessile16S_N704.S517 | breakaway | Negative Binomial 4891 | 45.00204 | 4547.293 | 9905.3 | ALR5.ARMS2.Sessile16S_ALR5.ARMS2.Sessile16S_N705.S517 | breakaway | Negative Binomial 458.6826 | 1.528755 | 457.2702 | 467.4794 | ALR5.ARMS3.Sessile16S_ALR5.ARMS3.Sessile16S_N706.S517 | breakaway | Negative Binomial 6720.532 | 60.7512 | 6142.309 | 15593.34 | ALR7.ARMS1.Sessile16S_ALR7.ARMS1.Sessile16S_N707.S517 | breakaway | Negative Binomial 6339.915 | 49.97062 | 5942.946 | 12375.66 | ALR7.ARMS2.Sessile16S_ALR7.ARMS2.Sessile16S_N708.S517 | breakaway | Negative Binomial 4648.5 | 43.86522 | 4308.695 | 9482.126 | ALR7.ARMS3.Sessile16S_ALR7.ARMS3.Sessile16S_N709.S517 | breakaway | Negative Binomial 5644.573 | 125.3464 | 4906.555 | 28564.55 | ASHO.ARMS1.Sessile16S_ASHO.ARMS1.Sessile16S_N712.S503 | breakaway | Kemp 5453.523 | 76.38373 | 4824.361 | 17602.71 | ASHO.ARMS2.Sessile16S_ASHO.ARMS2.Sessile16S_N701.S504 | breakaway | Negative Binomial 6118.855 | 1952.714 | 5010.149 | 303512.5 | ASHO.ARMS3.Sessile16S_ASHO.ARMS3.Sessile16S_N702.S504 | breakaway | Kemp 5237.935 | 78.97496 | 4642.151 | 17606.47 | DR07.ARMS1.Sessile16S_DR07.ARMS1.Sessile16S_N709.S502 | breakaway | Negative Binomial 6021.977 | 27.51949 | 5820.383 | 8217.401 | DR07.ARMS2.Sessile16S_DR07.ARMS2.Sessile16S_N710.S502 | breakaway | Negative Binomial 5420.056 | 61.72487 | 4872.078 | 14291.03 | DR07.ARMS3.Sessile16S_DR07.ARMS3.Sessile16S_N711.S502 | breakaway | Negative Binomial 3422.448 | 87.28344 | 2959.824 | 15892.56 | DR12.ARMS1.Sessile16S_DR12.ARMS1.Sessile16S_N712.S502 | breakaway | Negative Binomial 5562.32 | 66.63219 | 5038.318 | 15143.08 | DR12.ARMS2.Sessile16S_DR12.ARMS2.Sessile16S_N701.S503 | breakaway | Negative Binomial 4887.173 | 128.678 | 3991.292 | 30637.21 | DR12.ARMS3.Sessile16S_DR12.ARMS3.Sessile16S_N702.S503 | breakaway | Kemp 9150.228 | 59.74665 | 8598.153 | 17690.89 | JD01.ARMS1.Sessile16S_JD01.ARMS1.Sessile16S_N701.S502 | breakaway | Negative Binomial 8415.425 | 48.09249 | 7939.075 | 14553.86 | JD01.ARMS2.Sessile16S_JD01.ARMS2.Sessile16S_N702.S502 | breakaway | Negative Binomial 6695.322 | 21.99362 | 6530.113 | 8217.574 | JD01.ARMS3.Sessile16S_JD01.ARMS3.Sessile16S_N703.S502 | breakaway | Negative Binomial 5490.077 | 52.47382 | 5073.381 | 12024.86 | JD02.ARMS2.Sessile16S_JD02.ARMS2.Sessile16S_N704.S502 | breakaway | Negative Binomial 6653.915 | 52.30785 | 6190.994 | 13420.76 | JD02.ARMS3.Sessile16S_JD02.ARMS3.Sessile16S_N705.S502 | breakaway | Negative Binomial 5989.555 | 31.02448 | 5747.485 | 8726.801 | JD03.ARMS1.Sessile16S_JD03.ARMS1.Sessile16S_N706.S502 | breakaway | Negative Binomial 8205.884 | 69.52656 | 7572.244 | 19070.41 | JD03.ARMS2.Sessile16S_JD03.ARMS2.Sessile16S_N707.S502 | breakaway | Negative Binomial 6233.369 | 47.14148 | 5815.053 | 11945.89 | JD03.ARMS3.Sessile16S_JD03.ARMS3.Sessile16S_N708.S502 | breakaway | Negative Binomial 3812.753 | 65.53812 | 3479.823 | 11623.39 | JS1.A.16S_JS1.A.16S | breakaway | Negative Binomial 3848.036 | 66.12373 | 3493.241 | 11953.62 | JS1.B.16S_JS1.B.16S | breakaway | Negative Binomial 6709.272 | 67.3091 | 6293.522 | 15547.51 | JS1.C.16S_JS1.C.16S | breakaway | Negative Binomial 5262.193 | 55.00563 | 4936.746 | 11556.15 | JS2.A.16S_JS2.A.16S | breakaway | Negative Binomial 7070.512 | 45.48981 | 6734.943 | 12109.19 | JS2.B.16S_JS2.B.16S | breakaway | Negative Binomial 3620.716 | 70.93435 | 3287.506 | 12174.42 | JS2.C.16S_JS2.C.16S | breakaway | Negative Binomial 4286.534 | 21.95959 | 4163.125 | 5666.022 | JS3.A.S.16S_JS3.A.S.16S | breakaway | Negative Binomial 3724.142 | 18.33384 | 3625.682 | 4730.153 | JS3.B.S.16S_JS3.B.S.16S | breakaway | Negative Binomial 3416.822 | 58.33493 | 3225.075 | 8805.649 | JS3.C.S.16S_JS3.C.S.16S | breakaway | Negative Binomial 3188.391 | 31.09899 | 3024.098 | 5583.832 | M_17_4651_WS.A.S.16S | breakaway | Negative Binomial 3984.609 | 44.61315 | 3792.763 | 7946.099 | M_17_4652_WS.B.S.16S | breakaway | Negative Binomial 4132.029 | 40.42889 | 3955.638 | 7536.589 | M_17_4653_WS.C.S.16S | breakaway | Negative Binomial 3707.627 | 55.43978 | 3488.293 | 9103.62 | M_17_4657_NA.A.S.16S | breakaway | Negative Binomial 3186.63 | 17.36851 | 3100.221 | 4082.821 | M_17_4658_NA.B.S.16S | breakaway | Negative Binomial 3160.689 | 28.73859 | 3035.532 | 5121.776 | M_17_4659_NA.C.S.16S | breakaway | Negative Binomial 6268.246 | 57.20121 | 5970.612 | 12628.35 | M_17_4660_SA.A.S.16S | breakaway | Negative Binomial 4457.405 | 69.04895 | 4158.719 | 12368.88 | M_17_4661_SA.B.S.16S | breakaway | Negative Binomial 3365.755 | 315.1407 | 3224.939 | 24633.96 | M_17_4662_SA.C.S.16S | breakaway | Kemp 4041.4 | 47.23777 | 3817.67 | 8554.207 | M_17_4663_AFL.A.S.16S | breakaway | Negative Binomial 4351.091 | 104.7612 | 4022.556 | 17295.4 | M_17_4664_AFL.B.S.16S | breakaway | Negative Binomial 5311.93 | 75.42441 | 4969.147 | 14591.93 | M_17_4665_AFL.C.S.16S | breakaway | Negative Binomial 2265.41 | 40.65141 | 2113.506 | 5491.45 | M_17_4666_ASA.A.S.16S | breakaway | Negative Binomial 3391.8 | 91.66961 | 3074.364 | 14470.65 | M_17_4667_ASA.B.S.16S | breakaway | Negative Binomial 4999.628 | 79.63286 | 4659.215 | 14826.65 | M_17_4668_ASA.C.S.16S | breakaway | Negative Binomial 11812.03 | 34.00939 | 11604.82 | 14732.69 | M_17_4711_DR7A.S.16S | breakaway | Negative Binomial 11065.06 | 39.78179 | 10794.06 | 14996.27 | M_17_4712_DR7B.S.16S | breakaway | Negative Binomial 9179.967 | 69.79417 | 8777.617 | 18277.65 | M_17_4713_DR7C.S.16S | breakaway | Negative Binomial 5628.103 | 28.12698 | 5465.64 | 7725.616 | M_17_4720_DR12A.S.16S | breakaway | Negative Binomial 6638.988 | 25.38691 | 6475.894 | 8475.681 | M_17_4721_DR12B.S.16S | breakaway | Negative Binomial 5990.546 | 34.16015 | 5795.237 | 8865.123 | M_17_4722_DR12C.S.16S | breakaway | Negative Binomial I wanted to attach the phylosec object as .rds file in case you would like to run it yourself, but Github won't allow this file type to be attached. Thanks for your feedback Nauras
adw96 commented 1 year ago

Hi @naurasd -- my apologies for the delay on getting back to you. This actually looks really good! I expect the reason why you have similar standard errors are because the data structures are fairly similar across samples, likely due to fairly uniform sequencing and post processing practices. I think this is a good thing! There's nothing that makes me worry that breakaway is sharing information across samples in a way that it shouldn't be doing. I hope this puts your mind at ease but please let me know if you have any more questions and feel free to reopen this issue.

naurasd commented 1 year ago

Alright, thanks so much ;-)