bnprks / BPCells

Scaling Single Cell Analysis to Millions of Cells
https://bnprks.github.io/BPCells
Other
166 stars 17 forks source link

address 0x780e5835e340, cause 'memory not mapped' #89

Closed Yunuuuu closed 5 months ago

Yunuuuu commented 6 months ago

As described in https://github.com/satijalab/seurat/issues/8391, I have noticed the memory not mapped was only coming when we coerced the BPCells matrix into dgCmatrix. "merged_seurat" is the path of BPCells matrix.

counts_merged <- BPCells::open_matrix_dir("merged_seurat")
as(counts_merged, "dgCMatrix")

 *** caught segfault ***
address 0x780e5835e340, cause 'memory not mapped'

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
Yunuuuu commented 6 months ago

So the issue https://github.com/satijalab/seurat/issues/8391 should be related with BPCells instead of Seurat, since my code don't have any Seurat code.

Yunuuuu commented 6 months ago

From the traceback of https://github.com/satijalab/seurat/issues/8391, build_csparse_matrix_double_cpp should be the problem.

bnprks commented 6 months ago

Hi @Yunuuuu, how large is the dataset you are running this on? I ask, because there are three possible causes of errors like 'memory not mapped'. One possibility is a bug in the BPCells code, another is you are running out of memory on your computer, and the final possibility is that R dgCMatrix objects cannot reliably hold more than ~2 billion non-zero entries due to an integer overflow error.

When running as(counts_merged, "dgCMatrix"), if there is not enough space to fit the sparse matrix in memory then BPCells will crash as it's trying to make that in-memory object. The memory used for an R sparse matrix is about 12 bytes per non-zero entry.

How large is the matrix you are working with and how much memory is on your machine? (You can measure nonzeros by running sum(matrix_stats(counts_merged, col_stats="nonzero")$col_stats)).

If it is a BPCells bug, it should happen even with smaller matrix sizes (i.e. 1 billion non-zero entries and below), in which case we can try to find a way for me to reproduce and debug it.

Yunuuuu commented 6 months ago

Thanks for your detail explanations. We have 29 billion non-zero entries.

image

bnprks commented 5 months ago

ah, yes 2.9 billion non-zero entries is over the limit for what a "dgCMatrix" object can hold. If you only convert a subset of the BPCells matrix to dgCMatrix that is likely to work, but the full size matrix should be impossible due to limitations/bugs within the R Matrix package.

Let me know if you come across errors for matrices with under 2 billion entries, as that would likely hint at something that can be improved within BPCells itself. Sorry to not have a more helpful solution here -- this is definitely one of the more frustrating limitations of the R Matrix package.