bnprks / BPCells

Scaling Single Cell Analysis to Millions of Cells
https://bnprks.github.io/BPCells
Other
134 stars 11 forks source link

class S4 is not subsettablle #78

Open Flu09 opened 3 months ago

Flu09 commented 3 months ago

I submitted the same question at https://github.com/satijalab/seurat/issues/8604

I just do not know the source of the issue if its the BPCells library or Seurat.

but this issue seems to happen only when I use conda.

h5ad file wget -c https://datasets.cellxgene.cziscience.com/a5463d8f-07df-4870-8cae-bc504de762c8.h5ad

that is how i installed the two packages:

conda create -n Rtools r-base=4.3.2 conda activate Rtools
conda install conda-forge::r-seurat conda install conda-forge::r-matrix=1.6.3 conda install rschauner::r-bpcells the I updated seurat from r using install.packages("Seurat")

bnprks commented 3 months ago

Hi @Flu09, thanks for your question. Would you be able to provide an end-to-end example that shows code to reproduce the bug you're getting? (You can skip the installation steps, just starting from the code to create a Seurat project from the h5ad link you list)

It would be pretty quick for me to find where the issue is happening using R's debugging tools, but only once I'm able to run and reproduce the issue on my end.

Two related tips in case they're useful to you:

  1. If you can use backticks to get block code formatting nicely in markdown as follows: (putting the "r" at the end of the first line makes it use R syntax highlighting)
    ```r
    # R code
    x <- 1
  2. If you put the code that causes a crash into a function, then put a call to the built-in browser() function before the line that causes the crash, it's possible to use R's builtin debugger to find where and how crashes are happening. (More info here).
Flu09 commented 3 months ago

Hello,

I run a new test on the file. This time I did not add the metadata nor imported the dim reduction. and I got this time a 'memory not mapped' when running RunPCA(). I run the scripts using slurm, so probably browser() would not work for me.

caught segfault address 0x149ccf3533f0, cause 'memory not mapped'

Traceback: 1: build_csparse_matrix_double_cpp(iter) 2: asMethod(object) 3: as(from, "dgCMatrix") 4: as.matrix(.) 5: as(from, "dgCMatrix") %>% as.matrix() 6: asMethod(object) 7: as(x, "matrix") 8: as.matrix.IterableMatrix(X) 9: as.matrix(X) 10: apply(X = data.use, MARGIN = 1L, FUN = var) 11: PrepDR5(object = object, features = features, layer = layer, verbose = verbose) 12: RunPCA.StdAssay(object = object[[assay]], assay = assay, features = features, npcs = npcs, rev.pca = rev.pca, weight.by.var = weight.by.var, verbose = verbose, ndims.print = ndims.print, nfeatures.print = nfeatures.print, reduction.key = reduction.key, seed.use = seed.use, ...) 13: RunPCA(object = object[[assay]], assay = assay, features = features, npcs = npcs, rev.pca = rev.pca, weight.by.var = weight.by.var, verbose = verbose, ndims.print = ndims.print, nfeatures.print = nfeatures.print, reduction.key = reduction.key, seed.use = seed.use, ...) 14: RunPCA.Seurat(seurat_object) 15: RunPCA(seurat_object) 16: eval(expr, envir, enclos) 17: eval(expr, envir, enclos) 18: eval_with_user_handlers(expr, envir, enclos, user_handlers) 19: withVisible(eval_with_user_handlers(expr, envir, enclos, user_handlers)) 20: withCallingHandlers(withVisible(eval_with_user_handlers(expr, envir, enclos, user_handlers)), warning = wHandler, error = eHandler, message = mHandler) 21: handle(ev <- withCallingHandlers(withVisible(eval_with_user_handlers(expr, envir, enclos, user_handlers)), warning = wHandler, error = eHandler, message = mHandler)) 22: timing_fn(handle(ev <- withCallingHandlers(withVisible(eval_with_user_handlers(expr, envir, enclos, user_handlers)), warning = wHandler, error = eHandler, message = mHandler))) 23: evaluate_call(expr, parsed$src[[i]], envir = envir, enclos = enclos, debug = debug, last = i == length(out), use_try = stop_on_error != 2L, keep_warning = keep_warning, keep_message = keep_message, log_echo = log_echo, log_warning = log_warning, output_handler = output_handler, include_timing = include_timing) 24: evaluate::evaluate(...) 25: evaluate(code, envir = env, new_device = FALSE, keep_warning = if (is.numeric(options$warning)) TRUE else options$warning, keep_message = if (is.numeric(options$message)) TRUE else options$message, stop_on_error = if (is.numeric(options$error)) options$error else { if (options$error && options$include) 0L else 2L }, output_handler = knit_handlers(options$render, options)) 26: in_dir(input_dir(), expr) 27: in_input_dir(evaluate(code, envir = env, new_device = FALSE, keep_warning = if (is.numeric(options$warning)) TRUE else options$warning, keep_message = if (is.numeric(options$message)) TRUE else options$message, stop_on_error = if (is.numeric(options$error)) options$error else { if (options$error && options$include) 0L else 2L }, output_handler = knit_handlers(options$render, options))) 28: eng_r(options) 29: block_exec(params) 30: call_block(x) 31: process_group.block(group) 32: process_group(group) 33: withCallingHandlers(if (tangle) process_tangle(group) else process_group(group), error = function(e) if (xfun::pkg_available("rlang", "1.0.0")) rlang::entrace(e)) 34: withCallingHandlers(expr, error = function(e) { loc = paste0(current_lines(), label, sprintf(" (%s)", knit_concord$get("infile"))) message(one_string(handler(e, loc)))}) 35: handle_error(withCallingHandlers(if (tangle) process_tangle(group) else process_group(group), error = function(e) if (xfun::pkg_available("rlang", "1.0.0")) rlang::entrace(e)), function(e, loc) { setwd(wd) write_utf8(res, output %n% stdout()) paste0("\nQuitting from lines ", loc) }, if (labels[i] != "") sprintf(" [%s]", labels[i])) 36: process_file(text, output) 37: knitr::knit(knit_input, knit_output, envir = envir, quiet = quiet) 38: rmarkdown::render("/user/brainref/hope.Rmd") An irrecoverable exception occurred. R is aborting now ... /var/spool/slurm/job32765541/slurm_script: line 25: 3267803 Segmentation fault Rscript -e "rmarkdown::render('/user/brainref/test.Rmd')"

code:

library(Seurat)
library(BPCells)
file_path <- "/user/brainref/a5463d8f-07df-4870-8cae-bc504de762c8.h5ad"
data <- open_matrix_anndata_hdf5(file_path)
write_matrix_dir(mat = data, dir = gsub(".h5ad", "_BP", file_path))
mat <- open_matrix_dir(dir = gsub(".h5ad", "_BP", file_path))
seurat_object  <- CreateSeuratObject(counts = mat)
seurat_object <- NormalizeData(seurat_object)
seurat_object  <- FindVariableFeatures(seurat_object)
seurat_object  <- ScaleData(seurat_object)
seurat_object
#saveRDS(seurat_object,  "/user/brainref/new_ref.rds")
options(error = function() {
  dump.frames(to.file = TRUE, to.file.raise = FALSE)
})
seurat_object <- RunPCA(seurat_object)
bnprks commented 1 month ago

Hi @Flu09, sorry for the long delay on this. I think I have isolated this particular error to a problem in Seurat which I have just opened a pull request for.

In short, the PrepDR5 function from Seurat ends up converting the whole scale.data layer to an in-memory object accidentally, and a relatively simple change can help avoid that.

I'm pretty sure that the crash you experienced was related to memory issues, either directly running out of memory or problems with R dgCMatrix objects not supporting more than 2.1 billion entries without crashing.

I'm trying to do a run on your example dataset now to confirm that my fix will work for you -- I'll update here once I've got that run to completion with and without my fix for Seurat.