drisso / SingleCellExperiment

Clone of the Bioconductor repository for the SingleCellExperiment package, see https://bioconductor.org/packages/devel/bioc/html/SingleCellExperiment.html for the official development version.
63 stars 17 forks source link

cannot convert DelayedMatrix format for big sce object #72

Open Marwansha opened 4 months ago

Marwansha commented 4 months ago

Hey i am failing to convert the delayed matrix to dgCMatrix or CsparseMatrix for a big matrix file. noting that the same code work for a subset of 10k cells and if you have any advice on how to do this i would appreciate it.

> sce
class: SingleCellExperiment 
dim: 36623 697321 
metadata(11): _scvi_manager_uuid _scvi_uuid ... umap x_condition
assays(1): counts
rownames: NULL
rowData names(8): FDR bio ... tech total
colnames(697321): TTCAATCTCGAACCTA CGTAAGTTCACTCCGT ... GCTGGGTGTGTTCCAA AGTGATCAGCAATTAG-1
colData names(56): sample_id condition ... age_group age_sex
reducedDimNames(0):
mainExpName: NULL
altExpNames(0):

 > sce@assays@data@listData$counts%>%str()
**Formal class 'DelayedMatrix' [package "DelayedArray"] with 1 slot**
  ..@ seed:Formal class 'DelayedSetDimnames' [package "DelayedArray"] with 2 slots
  .. .. ..@ dimnames:List of 2
  .. .. .. ..$ **: int -1**
  .. .. .. ..$ : chr [1:697321] "TTCAATCTCGAACCTA" "CGTAAGTTCACTCCGT" "GAGAGGTCACCCTATC" "GTGCACGAGTCGAAGC" ...
  .. .. ..@ seed    :Formal class 'DelayedSubset' [package "DelayedArray"] with 2 slots
  .. .. .. .. ..@ index:List of 2
  .. .. .. .. .. ..$ : NULL
  .. .. .. .. .. ..$ : int [1:697321] 1 2 3 4 5 6 7 8 9 10 ...
  .. .. .. .. ..@ seed :Formal class 'CSC_H5SparseMatrixSeed' [package "HDF5Array"] with 6 slots
  .. .. .. .. .. .. ..@ filepath     : chr **"/pasteur/zeus/projets/p02/LabExMI/singleCell/V3/scRNA_NS_IAV_COV/results/merged_object/raw_v4.h5ad"**
  .. .. .. .. .. .. ..@ group        : chr "/X"
  .. .. .. .. .. .. ..@ subdata      : NULL
  .. .. .. .. .. .. ..@ dim          : int [1:2] 36623 714138
  .. .. .. .. .. .. ..@ indptr_ranges:'data.frame': 714138 obs. of  2 variables:
  .. .. .. .. .. .. .. ..$ start: num [1:714138] 1 8370 16015 24161 30876 ...
  .. .. .. .. .. .. .. ..$ width: int [1:714138] 8369 7645 8146 6715 7141 7118 7349 7288 7221 6599 ...
  .. .. .. .. .. .. ..@ dimnames     :List of 2
  .. .. .. .. .. .. .. ..$ : NULL
  .. .. .. .. .. .. .. ..$ : NULL

>assays(sce)$counts
36623 x 697321> sparse matrix of class DelayedMatrix and type "double":

 > assays(sce)$counts_dgC <- as(assays(sce)$counts, "dgCMatrix")
Error in rbind(...) : negative extents to matrix

> sce1=sce[,1:10000]

> assays(sce1)$counts_dgC <- as(assays(sce1)$counts, "dgCMatrix")
> assays(sce1)$counts_dgC
36623 x 10000 sparse Matrix of class "dgCMatrix"
LTLA commented 3 months ago

Not an SCE problem. If I had to guess, this is a fundamental issue with the Matrix package, in that the sparse matrix pointers (i.e., p) are stored as 32-bit signed integers. Thus, the maximum number of non-zero counts that can be stored in a *gCMatrix is limited to 2^31-1. In fact, this limitation motivates the development of a lot of the DelayedArray classes in the first place, e.g., SVT_SparseMatrix directly replaces the dgCMatrix for many of my applications.