Open Flu09 opened 8 months ago
Hi @Flu09, thanks for your question. Would you be able to provide an end-to-end example that shows code to reproduce the bug you're getting? (You can skip the installation steps, just starting from the code to create a Seurat project from the h5ad link you list)
It would be pretty quick for me to find where the issue is happening using R's debugging tools, but only once I'm able to run and reproduce the issue on my end.
Two related tips in case they're useful to you:
```r
# R code
x <- 1
browser()
function before the line that causes the crash, it's possible to use R's builtin debugger to find where and how crashes are happening. (More info here).Hello,
I run a new test on the file. This time I did not add the metadata nor imported the dim reduction. and I got this time a 'memory not mapped' when running RunPCA(). I run the scripts using slurm, so probably browser() would not work for me.
caught segfault address 0x149ccf3533f0, cause 'memory not mapped'
Traceback: 1: build_csparse_matrix_double_cpp(iter) 2: asMethod(object) 3: as(from, "dgCMatrix") 4: as.matrix(.) 5: as(from, "dgCMatrix") %>% as.matrix() 6: asMethod(object) 7: as(x, "matrix") 8: as.matrix.IterableMatrix(X) 9: as.matrix(X) 10: apply(X = data.use, MARGIN = 1L, FUN = var) 11: PrepDR5(object = object, features = features, layer = layer, verbose = verbose) 12: RunPCA.StdAssay(object = object[[assay]], assay = assay, features = features, npcs = npcs, rev.pca = rev.pca, weight.by.var = weight.by.var, verbose = verbose, ndims.print = ndims.print, nfeatures.print = nfeatures.print, reduction.key = reduction.key, seed.use = seed.use, ...) 13: RunPCA(object = object[[assay]], assay = assay, features = features, npcs = npcs, rev.pca = rev.pca, weight.by.var = weight.by.var, verbose = verbose, ndims.print = ndims.print, nfeatures.print = nfeatures.print, reduction.key = reduction.key, seed.use = seed.use, ...) 14: RunPCA.Seurat(seurat_object) 15: RunPCA(seurat_object) 16: eval(expr, envir, enclos) 17: eval(expr, envir, enclos) 18: eval_with_user_handlers(expr, envir, enclos, user_handlers) 19: withVisible(eval_with_user_handlers(expr, envir, enclos, user_handlers)) 20: withCallingHandlers(withVisible(eval_with_user_handlers(expr, envir, enclos, user_handlers)), warning = wHandler, error = eHandler, message = mHandler) 21: handle(ev <- withCallingHandlers(withVisible(eval_with_user_handlers(expr, envir, enclos, user_handlers)), warning = wHandler, error = eHandler, message = mHandler)) 22: timing_fn(handle(ev <- withCallingHandlers(withVisible(eval_with_user_handlers(expr, envir, enclos, user_handlers)), warning = wHandler, error = eHandler, message = mHandler))) 23: evaluate_call(expr, parsed$src[[i]], envir = envir, enclos = enclos, debug = debug, last = i == length(out), use_try = stop_on_error != 2L, keep_warning = keep_warning, keep_message = keep_message, log_echo = log_echo, log_warning = log_warning, output_handler = output_handler, include_timing = include_timing) 24: evaluate::evaluate(...) 25: evaluate(code, envir = env, new_device = FALSE, keep_warning = if (is.numeric(options$warning)) TRUE else options$warning, keep_message = if (is.numeric(options$message)) TRUE else options$message, stop_on_error = if (is.numeric(options$error)) options$error else { if (options$error && options$include) 0L else 2L }, output_handler = knit_handlers(options$render, options)) 26: in_dir(input_dir(), expr) 27: in_input_dir(evaluate(code, envir = env, new_device = FALSE, keep_warning = if (is.numeric(options$warning)) TRUE else options$warning, keep_message = if (is.numeric(options$message)) TRUE else options$message, stop_on_error = if (is.numeric(options$error)) options$error else { if (options$error && options$include) 0L else 2L }, output_handler = knit_handlers(options$render, options))) 28: eng_r(options) 29: block_exec(params) 30: call_block(x) 31: process_group.block(group) 32: process_group(group) 33: withCallingHandlers(if (tangle) process_tangle(group) else process_group(group), error = function(e) if (xfun::pkg_available("rlang", "1.0.0")) rlang::entrace(e)) 34: withCallingHandlers(expr, error = function(e) { loc = paste0(current_lines(), label, sprintf(" (%s)", knit_concord$get("infile"))) message(one_string(handler(e, loc)))}) 35: handle_error(withCallingHandlers(if (tangle) process_tangle(group) else process_group(group), error = function(e) if (xfun::pkg_available("rlang", "1.0.0")) rlang::entrace(e)), function(e, loc) { setwd(wd) write_utf8(res, output %n% stdout()) paste0("\nQuitting from lines ", loc) }, if (labels[i] != "") sprintf(" [%s]", labels[i])) 36: process_file(text, output) 37: knitr::knit(knit_input, knit_output, envir = envir, quiet = quiet) 38: rmarkdown::render("/user/brainref/hope.Rmd") An irrecoverable exception occurred. R is aborting now ... /var/spool/slurm/job32765541/slurm_script: line 25: 3267803 Segmentation fault Rscript -e "rmarkdown::render('/user/brainref/test.Rmd')"
code:
library(Seurat)
library(BPCells)
file_path <- "/user/brainref/a5463d8f-07df-4870-8cae-bc504de762c8.h5ad"
data <- open_matrix_anndata_hdf5(file_path)
write_matrix_dir(mat = data, dir = gsub(".h5ad", "_BP", file_path))
mat <- open_matrix_dir(dir = gsub(".h5ad", "_BP", file_path))
seurat_object <- CreateSeuratObject(counts = mat)
seurat_object <- NormalizeData(seurat_object)
seurat_object <- FindVariableFeatures(seurat_object)
seurat_object <- ScaleData(seurat_object)
seurat_object
#saveRDS(seurat_object, "/user/brainref/new_ref.rds")
options(error = function() {
dump.frames(to.file = TRUE, to.file.raise = FALSE)
})
seurat_object <- RunPCA(seurat_object)
Hi @Flu09, sorry for the long delay on this. I think I have isolated this particular error to a problem in Seurat which I have just opened a pull request for.
In short, the PrepDR5
function from Seurat ends up converting the whole scale.data
layer to an in-memory object accidentally, and a relatively simple change can help avoid that.
I'm pretty sure that the crash you experienced was related to memory issues, either directly running out of memory or problems with R dgCMatrix objects not supporting more than 2.1 billion entries without crashing.
I'm trying to do a run on your example dataset now to confirm that my fix will work for you -- I'll update here once I've got that run to completion with and without my fix for Seurat.
Hello, how are you?
Thank you. I have another question which might be related. I have compiled some samples into a BPCells object its about 1.5 million cells and i faced this issue on the RunPCA step. I am not sure to open a new thread or it might be related so I just posted it here.
mergedf[["RNA"]] <- split(mergedf[["RNA"]], f = mergedf$dataset)
mergedf <- NormalizeData(mergedf)
mergedf <- FindVariableFeatures(mergedf)
mergedf <- ScaleData(mergedf, features = rownames(mergedf))
mergedf <- RunPCA(mergedf)
mergedf <- RunPCA(mergedf) Error in h(simpleError(msg, call)) : error in evaluating the argument 'x' in selecting a method for function 'as.matrix': invalid format '%d'; use format %f, %e, %g or %a for numeric objects
mergedf An object of class Seurat 40090 features across 1530043 samples within 1 assay Active assay: RNA (40090 features, 2000 variable features) 29 layers present: counts.dataset1, counts.dataset2, counts.dataset3, counts.dataset4, counts.dataset5, counts.dataset6, counts.dataset7, counts.dataset8, counts.dataset9, counts.dataset10, counts.dataset11, counts.dataset12, counts.dataset13, counts.dataset14, data.dataset1, data.dataset2, data.dataset3, data.dataset4, data.dataset5, data.dataset6, data.dataset7, data.dataset8, data.dataset9, data.dataset10, data.dataset11, data.dataset12, data.dataset13, datadataset14, scale.data
> sessionInfo()
R version 4.4.0 (2024-04-24)
Platform: x86_64-pc-linux-gnu
Running under: Rocky Linux 9.3 (Blue Onyx)
Matrix products: default
BLAS/LAPACK: /usr/lib64/libopenblas-r0.3.21.so; LAPACK version 3.9.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
time zone: UTC
tzcode source: system (glibc)
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] batchelor_1.20.0 SingleCellExperiment_1.26.0 SummarizedExperiment_1.34.0 Biobase_2.64.0
[5] GenomicRanges_1.56.0 GenomeInfoDb_1.40.1 IRanges_2.38.0 S4Vectors_0.42.0
[9] BiocGenerics_0.50.0 MatrixGenerics_1.16.0 matrixStats_1.3.0 SeuratWrappers_0.3.2
[13] patchwork_1.2.0 ggrepel_0.9.5 ggplot2_3.5.1 dplyr_1.1.4
[17] SeuratDisk_0.0.0.9021 Seurat_5.0.0 BPCells_0.2.0 SeuratObject_5.0.2
[21] sp_2.1-4
loaded via a namespace (and not attached):
[1] RcppAnnoy_0.0.22 splines_4.4.0 later_1.3.2 tibble_3.2.1 R.oo_1.26.0
[6] polyclip_1.10-7 fastDummies_1.7.3 lifecycle_1.0.4 globals_0.16.3 lattice_0.22-6
[11] hdf5r_1.3.11 MASS_7.3-60.2 magrittr_2.0.3 plotly_4.10.4 rmarkdown_2.27
[16] yaml_2.3.10 remotes_2.5.0 httpuv_1.6.15 sctransform_0.4.1 spam_2.10-0
[21] sessioninfo_1.2.2 pkgbuild_1.4.4 spatstat.sparse_3.1-0 reticulate_1.37.0 cowplot_1.1.3
[26] pbapply_1.7-2 RColorBrewer_1.1-3 ResidualMatrix_1.14.1 abind_1.4-5 pkgload_1.3.4
[31] zlibbioc_1.50.0 Rtsne_0.17 purrr_1.0.2 R.utils_2.12.3 GenomeInfoDbData_1.2.12
[36] irlba_2.3.5.1 listenv_0.9.1 spatstat.utils_3.0-5 goftest_1.2-3 RSpectra_0.16-2
[41] spatstat.random_3.3-1 fitdistrplus_1.2-1 parallelly_1.38.0 DelayedMatrixStats_1.26.0 leiden_0.4.3.1
[46] codetools_0.2-20 DelayedArray_0.30.1 scuttle_1.14.0 DT_0.33 tidyselect_1.2.1
[51] UCSC.utils_1.0.0 ScaledMatrix_1.12.0 spatstat.explore_3.3-1 jsonlite_1.8.8 BiocNeighbors_1.22.0
[56] ellipsis_0.3.2 progressr_0.14.0 ggridges_0.5.6 survival_3.7-0 tools_4.4.0
[61] ica_1.0-3 Rcpp_1.0.13 glue_1.7.0 gridExtra_2.3 SparseArray_1.4.8
[66] xfun_0.46 usethis_2.2.3 withr_3.0.1 BiocManager_1.30.23 fastmap_1.2.0
[71] fansi_1.0.6 digest_0.6.36 rsvd_1.0.5 R6_2.5.1 mime_0.12
[76] colorspace_2.1-1 scattermore_1.2 tensor_1.5 spatstat.data_3.1-2 R.methodsS3_1.8.2
[81] utf8_1.2.4 tidyr_1.3.1 generics_0.1.3 data.table_1.15.4 httr_1.4.7
[86] htmlwidgets_1.6.4 S4Arrays_1.4.1 uwot_0.2.2 pkgconfig_2.0.3 gtable_0.3.5
[91] lmtest_0.9-40 XVector_0.44.0 htmltools_0.5.8.1 profvis_0.3.8 dotCall64_1.1-1
[96] scales_1.3.0 png_0.1-8 spatstat.univar_3.0-0 knitr_1.48 rstudioapi_0.16.0
[101] reshape2_1.4.4 nlme_3.1-165 cachem_1.1.0 zoo_1.8-12 stringr_1.5.1
[106] KernSmooth_2.23-24 parallel_4.4.0 miniUI_0.1.1.1 pillar_1.9.0 grid_4.4.0
[111] vctrs_0.6.5 RANN_2.6.1 urlchecker_1.0.1 promises_1.3.0 BiocSingular_1.20.0
[116] beachmat_2.20.0 xtable_1.8-4 cluster_2.1.6 evaluate_0.24.0 cli_3.6.3
[121] compiler_4.4.0 rlang_1.1.4 crayon_1.5.3 future.apply_1.11.2 plyr_1.8.9
[126] fs_1.6.4 stringi_1.8.4 BiocParallel_1.38.0 viridisLite_0.4.2 deldir_2.0-4
[131] munsell_0.5.1 lazyeval_0.2.2 devtools_2.4.5 spatstat.geom_3.3-2 Matrix_1.7-0
[136] RcppHNSW_0.6.0 sparseMatrixStats_1.16.0 bit64_4.0.5 future_1.34.0 shiny_1.9.1
[141] ROCR_1.0-11 igraph_2.0.3 memoise_2.0.1 bit_4.0.5
Hi @Flu09, I think the error message you're seeing might be the same as in issue #95. That specific problem in creating an error message should be fixed if you re-install the latest BPCells, but unfortunately the underlying issue with RunPCA will probably still be there as it is due to an issue in Seurat's internal PrepDR5 function as mentioned above.
The fix I submitted at the end of May has not made it into the Seurat main branch yet, but it is in the develop branch. If you re-install Seurat from the github develop branch with remotes::install_github("satijalab/seurat", ref="develop")
I think it will make your issue go away. You might also consider updating to the latest BPCells with remotes::install_github("bnprks/BPCells/r")
, but without changing Seurat that will just result in a nicer error message being printed for this specific issue.
Hope that helps!
-Ben
Thank you. I got the develop version then I noticed it gave me alot of warning and errors so I did not give the RunPCA a test. then I tried to install back Seurat and now I get the 2 ^31 error when Joining the layers of objects ( a step i did numerously before without issues) The object together make 1.3 million. I am not sure what went wrong.
Hmm, I'm not personally very familiar with Seurat workflows that involve splitting/joining layers (usually I just use rbind/cbind to concatenate all my inputs and use a single layer during analysis). So it's hard for me to guess what might be going wrong without being able to test out an example with code + data on my end
Are you still working off of the h5ad file you linked initially in this issue? If you're able to share with me some example code + data that starts from an h5ad file and hits the errors you're talking about here then I can do a better job of tracking down where the crash is happening and what might be possible to fix it.
no I am working on a different one. I can prepare some data to send you. it is just this error now keep showing up after the bunch of install uninstall I did yesterday. Now even after sketching the BPCells object I cannot do RunPCA on the sketched assay.
`Error in h(simpleError(msg, call)) : error in evaluating the argument 'x' in selecting a method for function 'as.matrix': invalid format '%d'; use format %f, %e, %g or %a for numeric objects
18. h(simpleError(msg, call)) 17. .handleSimpleError(function (cond) .Internal(C_tryCatchHelper(addr, 1L, cond)), "invalid format '%d'; use format %f, %e, %g or %a for numeric objects", base::quote(sprintf("Input matrix has %d entries", length(res@index)))) 16. sprintf("Input matrix has %d entries", length(res@index)) 15. is_formula(message, scoped = TRUE, lhs = FALSE) 14. rlang::abort(c("Error converting IterableMatrix to dgCMatrix", "dgCMatrix objects cannot hold more than 2^31 non-zero entries", sprintf("Input matrix has %d entries", length(res@index)))) 13. asMethod(object) 12. as(from, "dgCMatrix") 11. as.matrix(as(from, "dgCMatrix")) 10. asMethod(object) 9. as(x, "matrix") 8. as.matrix.IterableMatrix(X) 7. as.matrix(X) 6. apply(X = data.use, MARGIN = 1L, FUN = var) 5. PrepDR5(object = object, features = features, layer = layer, verbose = verbose) 4. RunPCA.StdAssay(object = object[[assay]], assay = assay, features = features, npcs = npcs, rev.pca = rev.pca, weight.by.var = weight.by.var, verbose = verbose, ndims.print = ndims.print, nfeatures.print = nfeatures.print, reduction.key = reduction.key, seed.use = seed.use, ...) 3. RunPCA(object = object[[assay]], assay = assay, features = features, npcs = npcs, rev.pca = rev.pca, weight.by.var = weight.by.var, verbose = verbose, ndims.print = ndims.print, nfeatures.print = nfeatures.print, reduction.key = reduction.key, seed.use = seed.use, ...) 2. RunPCA.Seurat(filtered_obj_sketched) 1. RunPCA(filtered_obj_sketched)
`
Please do let me know once you get an example set up, though please also make sure you have the latest BPCells and Seurat versions installed as I described above.
This particular error message indicates to me that you don't have the most recent version of BPCells installed and your version of Seurat also doesn't match the latest from the develop branch. The key problem here is the Seurat version. I've included some instructions below to check if you have the necessary fixes applied in your current running Seurat version:
Short of installing a full new Seurat version, there's also a quick script that you can run to temporarily work around the relevant Seurat bug:
And in case you're curious to know a bit more of what's going wrong behind the scenes, here's some information:
I submitted the same question at https://github.com/satijalab/seurat/issues/8604
I just do not know the source of the issue if its the BPCells library or Seurat.
but this issue seems to happen only when I use conda.
h5ad file wget -c https://datasets.cellxgene.cziscience.com/a5463d8f-07df-4870-8cae-bc504de762c8.h5ad
that is how i installed the two packages:
conda create -n Rtools r-base=4.3.2 conda activate Rtools
conda install conda-forge::r-seurat conda install conda-forge::r-matrix=1.6.3 conda install rschauner::r-bpcells the I updated seurat from r using install.packages("Seurat")