chanzuckerberg / cellxgene-census

CZ CELLxGENE Discover Census
https://chanzuckerberg.github.io/cellxgene-census/
MIT License
83 stars 20 forks source link

R long vectors not supported yet: memory.c:3888 #1095

Open CianMurphy opened 6 months ago

CianMurphy commented 6 months ago

Describe the bug

When trying to create a seurat object I get the error message:

Error in vec_to_Array(x, type) : long vectors not supported yet: memory.c:3888 Calls: get_seurat ... -> -> -> vec_to_Array Execution halted

This is despite running the script on a cluster with 650GB memory.

To Reproduce

library("cellxgene.census") library("Seurat") library(data.table)

census_dat <- 'census_datasets.csv'

census <- open_soma() seurat <- get_seurat( census, organism = "Homo sapiens", obs_value_filter = "dataset_id == '9f222629-9e39-47d0-b83f-e08d610c7479'" )

Environment

R version 4.3.2 (2023-10-31) Platform: x86_64-conda-linux-gnu (64-bit) Running under: CentOS Linux 7 (Core)

R is installed via mamba and Anaconda3/2023.03

ebezzi commented 5 months ago

Hey @CianMurphy,

the reason you see this error is due to a limitation on the size of sparse matrices forced by dgCMatrix, which is the default sparse matrix class used by Seurat. See (for example) https://github.com/satijalab/seurat/issues/4380 for more details. The dataset you're trying to query is large enough to hit that limit and would fail a Seurat conversion even outside the Census.

This limitation can be removed by using Seurat v5, since it allows to use a sparse matrix class that is not dgCMatrix. Currently Seurat v5 isn't supported by the Census or TileDB-SOMA, which is the backend library used by the Census, although it's on the roadmap. We will publish an update when it will be available. In the meanwhile, those large datasets/slices can be analyzed with Python, which doesn't have this limitation.