Open michaelplynch opened 1 month ago
Yes please share sce_pm
at stefano.mangiola@adelaide.edu.au
Thanks.
I also notice on the Bioconductor vignette that for the boxplot all cell types are coloured as significant for c_typecancer, which looks inconsistent with the credible interval plot
I'll have a look tomorrow morning.
Hello, please try the branch https://github.com/MangiolaLaboratory/sccomp/tree/allow-vb-output-samples
I have allowed sampling iterations > 1000 for vb
Also I implemented the more modern backend with the fast and very reliable pathfinder, please try it out as well
VB with custom number of draws
full-bayes HMC
set.seed(3)
sccomp_result =
sccomp_test_sce |>
sccomp_estimate(
formula_composition = ~ type,
.sample = sample_id,
.cell_group = celltype_num,
bimodal_mean_variability_association = TRUE,
variational_inference = F
) |>
sccomp_remove_outliers(variational_inference = F) |> # Optional
sccomp_test()
sccomp_result
plots = sccomp_result |> plot()
#plots$boxplot
#plots$credible_intervals_1D
set.seed(1)
sccomp_result =
sccomp_test_sce |>
sccomp_estimate(
formula_composition = ~ type,
.sample = sample_id,
.cell_group = celltype_num,
bimodal_mean_variability_association = TRUE,
variational_inference = F
) |>
sccomp_remove_outliers(variational_inference = F) |> # Optional
sccomp_test()
sccomp_result
plots2 = sccomp_result |> plot()
#plots$boxplot
plots$credible_intervals_1D + plots2$credible_intervals_1D
Now the new cmdstanr backend (updated now, please reinstall), using the fast pathfinder in this case (HMC is always available)
set.seed(3)
sccomp_result =
sccomp_test_sce |>
sccomp_estimate(
formula_composition = ~ type,
.sample = sample_id,
.cell_group = celltype_num,
bimodal_mean_variability_association = TRUE
) |>
sccomp_remove_outliers() |> # Optional
sccomp_test()
sccomp_result
plots = sccomp_result |> plot()
#plots$boxplot
#plots$credible_intervals_1D
set.seed(1)
sccomp_result =
sccomp_test_sce |>
sccomp_estimate(
formula_composition = ~ type,
.sample = sample_id,
.cell_group = celltype_num,
bimodal_mean_variability_association = TRUE
) |>
sccomp_remove_outliers() |> # Optional
sccomp_test()
sccomp_result
plots2 = sccomp_result |> plot()
#plots$boxplot
plots$credible_intervals_1D + plots2$credible_intervals_1D
Having tested this a little, I would say, use
variational_bayes = FALSE,
it is the gold standard and gives the most consistent results.
I will investigate further the variational strategy.
Hi Stefano, Can you confirm that it's the cmdstanr branch I should be reinstalling on? On Windows I get an install error similar to your GHA. On Linux oddly enough it will install but errors with "Further attempt with Variational Bayes: Error: Model not compiled. Try running the compile() method first."
You can
use the master branch but using variational_bayes = FALSE
. A great option, that you should always use if you can (a bit slower).
install cmdstan branch and use inference_method = "pathfinder"
(very fast and accurate), and do clear_stan_model_cache()
before executing, if you have compile() method first
problem. (not sure how you can have it if you install it from scratch, some feedback is helpful on your experience, so to improve this branch). With cmdstanr
branch of course you can still use inference_method = "hmc"
, the gold standard.
Thanks Stefano.
Assuming you mean variational_inference=F
(rather than variational_bayes=F
). variational_bayes=F
returns an unused argument error.
For master branch, variational_inference=F
looks more consistent.
For cmdstanr branch, inference_method="pathfinder"
also looks more consistent.
Other feedback:
I'm not sure was the pathfinder implementation much faster than HMC on the master branch in this case but I haven't formally tested this.
For cmdstanr branch with HMC e.g.:
sccomp_result = sccomp_test_sce |> sccomp_estimate( formula_composition = ~ type, .sample = sample_id, .cell_group = celltype_num, bimodal_mean_variability_association = TRUE, inference_method = "HMC" ) |> sccomp_remove_outliers() |> # Optional sccomp_test()
and
sccomp_result = sccomp_test_sce |> sccomp_estimate( formula_composition = ~ type, .sample = sample_id, .cell_group = celltype_num, bimodal_mean_variability_association = TRUE, inference_method = "HMC", variational_inference = FALSE ) |> sccomp_remove_outliers() |> # Optional sccomp_test()
return errors Error in vb_iterative(mod, output_samples = output_samples, iter = 10000, : sccomp says: variational Bayes did not converge after 5 attempts. Please use variational_inference = FALSE for a HMC fitting.
and Error in fit2$num_chains() : attempt to apply non-function
, at least with this dataset.
Hi Stefano, Flagging some interesting behaviour I've come across with the package.
Problem: The 1d credible intervals plot is showing different results in terms of rank, credible interval and FDR significance when run with different seeds. I see the "EXPERIMENTAL ALGORITHM" message, is this a byproduct? I'm working off sccomp v1.8 from the most recent Bioconductor release. I also notice on the Bioconductor vignette that for the boxplot all cell types are coloured as significant for c_typecancer, which looks inconsistent with the credible interval plot. Is this also a bug or have I misinterpreted the plot?
Dataset: Working with a SingleCellExperiment object, happy to share privately if you want to replicate. Code: `set.seed(3) sccomp_result = sce_pm |> sccomp_estimate( formula_composition = ~ type, .sample = sample_id, .cell_group = celltype_num, bimodal_mean_variability_association = TRUE, cores = 4, max_sampling_iterations = 200000 ) |> sccomp_remove_outliers(cores = 1) |> # Optional sccomp_test()
sccomp_result
plots = sccomp_result |> plot() plots$boxplot plots$credible_intervals_1D
set.seed(1) sccomp_result = sce_pm |> sccomp_estimate( formula_composition = ~ type, .sample = sample_id, .cell_group = celltype_num, bimodal_mean_variability_association = TRUE, cores = 4, max_sampling_iterations = 200000 ) |> sccomp_remove_outliers(cores = 1) |> # Optional sccomp_test()
sccomp_result
plots = sccomp_result |> plot() plots$boxplot plots$credible_intervals_1D` Output:
![image](https://github.com/MangiolaLaboratory/sccomp/assets/89206184/28210f01-e3d8-4a91-a829-71c446e379dd)
sessionInfo() R version 4.4.0 (2024-04-24) Platform: x86_64-pc-linux-gnu Running under: Red Hat Enterprise Linux 9.1 (Plow)
Matrix products: default BLAS/LAPACK: FlexiBLAS OPENBLAS-OPENMP; LAPACK version 3.9.0
locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
time zone: Europe/Dublin tzcode source: system (glibc)
attached base packages: [1] stats4 stats graphics grDevices utils datasets [7] methods base
other attached packages: [1] sccomp_1.8.0
[2] ggplotify_0.1.2
[3] dittoSeq_1.16.0
[4] Seurat_5.1.0
[5] SeuratObject_5.0.2
[6] sp_2.1-4
[7] SCpubr_2.0.2
[8] ggpubr_0.6.0
[9] ggplot2_3.5.1
[10] dplyr_1.1.4
[11] edgeR_4.2.0
[12] limma_3.60.2
[13] SingleCellExperiment_1.26.0
[14] SummarizedExperiment_1.34.0
[15] Biobase_2.64.0
[16] GenomicRanges_1.56.0
[17] GenomeInfoDb_1.40.1
[18] IRanges_2.38.0
[19] S4Vectors_0.42.0
[20] BiocGenerics_0.50.0
[21] MatrixGenerics_1.16.0
[22] matrixStats_1.3.0
[23] ccrccprimarymetsanalysis_0.0.0.9000 [24] devtools_2.4.5
[25] usethis_2.2.3
loaded via a namespace (and not attached): [1] fs_1.6.4 spatstat.sparse_3.0-3
[3] httr_1.4.7 RColorBrewer_1.1-3
[5] profvis_0.3.8 tools_4.4.0
[7] sctransform_0.4.1 backports_1.5.0
[9] utf8_1.2.4 R6_2.5.1
[11] ResidualMatrix_1.14.0 lazyeval_0.2.2
[13] uwot_0.2.2 urlchecker_1.0.1
[15] withr_3.0.0 gridExtra_2.3
[17] progressr_0.14.0 cli_3.6.2
[19] spatstat.explore_3.2-7 fastDummies_1.7.3
[21] labeling_0.4.3 spatstat.data_3.0-4
[23] readr_2.1.5 ggridges_0.5.6
[25] pbapply_1.7-2 QuickJSR_1.2.2
[27] yulab.utils_0.1.4 StanHeaders_2.32.9
[29] parallelly_1.37.1 sessioninfo_1.2.2
[31] rstudioapi_0.16.0 generics_0.1.3
[33] gridGraphics_0.5-1 ica_1.0-3
[35] spatstat.random_3.2-3 car_3.1-2
[37] inline_0.3.19 loo_2.7.0
[39] Matrix_1.7-0 fansi_1.0.6
[41] abind_1.4-5 lifecycle_1.0.4
[43] yaml_2.3.8 carData_3.0-5
[45] SparseArray_1.4.8 Rtsne_0.17
[47] grid_4.4.0 promises_1.3.0
[49] dqrng_0.4.1 crayon_1.5.2
[51] miniUI_0.1.1.1 lattice_0.22-6
[53] beachmat_2.20.0 cowplot_1.1.3
[55] demuxmix_1.6.0 pillar_1.9.0
[57] knitr_1.47 metapod_1.12.0
[59] boot_1.3-30 future.apply_1.11.2
[61] codetools_0.2-20 leiden_0.4.3.1
[63] glue_1.7.0 data.table_1.15.4
[65] remotes_2.5.0 vctrs_0.6.5
[67] png_0.1-8 spam_2.10-0
[69] gtable_0.3.5 assertthat_0.2.1
[71] cachem_1.1.0 xfun_0.44
[73] S4Arrays_1.4.1 mime_0.12
[75] survival_3.5-8 pheatmap_1.0.12
[77] statmod_1.5.0 bluster_1.14.0
[79] ellipsis_0.3.2 fitdistrplus_1.1-11
[81] ROCR_1.0-11 nlme_3.1-164
[83] RcppAnnoy_0.0.22 rstan_2.32.6
[85] irlba_2.3.5.1 KernSmooth_2.23-22
[87] colorspace_2.1-0 tidyselect_1.2.1
[89] compiler_4.4.0 BiocNeighbors_1.22.0
[91] DelayedArray_0.30.1 plotly_4.10.4
[93] scales_1.3.0 lmtest_0.9-40
[95] stringr_1.5.1 digest_0.6.35
[97] goftest_1.2-3 spatstat.utils_3.0-4
[99] rmarkdown_2.27 XVector_0.44.0
[101] htmltools_0.5.8.1 pkgconfig_2.0.3
[103] sparseMatrixStats_1.16.0 fastmap_1.2.0
[105] rlang_1.1.4 htmlwidgets_1.6.4
[107] UCSC.utils_1.0.0 shiny_1.8.1.1
[109] DelayedMatrixStats_1.26.0 farver_2.1.2
[111] zoo_1.8-12 jsonlite_1.8.8
[113] BiocParallel_1.38.0 BiocSingular_1.20.0
[115] magrittr_2.0.3 scuttle_1.14.0
[117] GenomeInfoDbData_1.2.12 dotCall64_1.1-1
[119] patchwork_1.2.0 munsell_0.5.1
[121] Rcpp_1.0.12 viridis_0.6.5
[123] reticulate_1.37.0 stringi_1.8.4
[125] zlibbioc_1.50.0 MASS_7.3-60.2
[127] plyr_1.8.9 pkgbuild_1.4.4
[129] parallel_4.4.0 listenv_0.9.1
[131] ggrepel_0.9.5 forcats_1.0.0
[133] deldir_2.0-4 splines_4.4.0
[135] tensor_1.5 hms_1.1.3
[137] locfit_1.5-9.9 igraph_2.0.3
[139] spatstat.geom_3.2-9 ggsignif_0.6.4
[141] RcppHNSW_0.6.0 reshape2_1.4.4
[143] ScaledMatrix_1.12.0 pkgload_1.3.4
[145] rstantools_2.4.0 evaluate_0.24.0
[147] BiocManager_1.30.23 RcppParallel_5.1.7
[149] scran_1.32.0 tzdb_0.4.0
[151] httpuv_1.6.15 batchelor_1.20.0
[153] RANN_2.6.1 tidyr_1.3.1
[155] purrr_1.0.2 polyclip_1.10-6
[157] future_1.33.2 scattermore_1.2
[159] rsvd_1.0.5 broom_1.0.6
[161] xtable_1.8-4 RSpectra_0.16-1
[163] rstatix_0.7.2 later_1.3.2
[165] viridisLite_0.4.2 tibble_3.2.1
[167] memoise_2.0.1 cluster_2.1.6
[169] globals_0.16.3
Thanks!