Closed Yunuuuu closed 5 months ago
So the issue https://github.com/satijalab/seurat/issues/8391 should be related with BPCells instead of Seurat, since my code don't have any Seurat code.
From the traceback of https://github.com/satijalab/seurat/issues/8391, build_csparse_matrix_double_cpp
should be the problem.
Hi @Yunuuuu, how large is the dataset you are running this on? I ask, because there are three possible causes of errors like 'memory not mapped'. One possibility is a bug in the BPCells code, another is you are running out of memory on your computer, and the final possibility is that R dgCMatrix objects cannot reliably hold more than ~2 billion non-zero entries due to an integer overflow error.
When running as(counts_merged, "dgCMatrix")
, if there is not enough space to fit the sparse matrix in memory then BPCells will crash as it's trying to make that in-memory object. The memory used for an R sparse matrix is about 12 bytes per non-zero entry.
How large is the matrix you are working with and how much memory is on your machine? (You can measure nonzeros by running sum(matrix_stats(counts_merged, col_stats="nonzero")$col_stats)
).
If it is a BPCells bug, it should happen even with smaller matrix sizes (i.e. 1 billion non-zero entries and below), in which case we can try to find a way for me to reproduce and debug it.
Thanks for your detail explanations. We have 29 billion non-zero entries.
ah, yes 2.9 billion non-zero entries is over the limit for what a "dgCMatrix" object can hold. If you only convert a subset of the BPCells matrix to dgCMatrix that is likely to work, but the full size matrix should be impossible due to limitations/bugs within the R Matrix
package.
Let me know if you come across errors for matrices with under 2 billion entries, as that would likely hint at something that can be improved within BPCells itself. Sorry to not have a more helpful solution here -- this is definitely one of the more frustrating limitations of the R Matrix
package.
As described in https://github.com/satijalab/seurat/issues/8391, I have noticed the memory not mapped was only coming when we coerced the BPCells matrix into dgCmatrix.
"merged_seurat"
is the path of BPCells matrix.