alexisvdb / singleCellHaystack

Finding surprising needles (=genes) in haystacks (=single cell transcriptome data).
https://alexisvdb.github.io/singleCellHaystack/
Other
78 stars 9 forks source link

apply in randomization step #15

Closed davemcg closed 4 years ago

davemcg commented 4 years ago

Running with 985k cells and 500GB of memory

### calling haystack_highD()...
### converting detection data from lgCMatrix to lgRMatrix
### scaling input data...
### deciding grid points...
### calculating Kullback-Leibler divergences...
  |======================================================================| 100%
### performing randomizations...

 *** caught segfault ***
address 0x2ab5cecb7044, cause 'memory not mapped'

Traceback:
 1: asMethod(object)
 2: as(.R.2.C(from), "matrix")
 3: asMethod(object)
 4: as(x, "matrix")
 5: as.matrix.Matrix(X)
 6: as.matrix(X)
 7: apply(detection, 1, sum)
 8: haystack_highD(x, detection = detection, use.advanced.sampling = use.advanced.sampling,     dir.randomization = dir.randomization, scale = scale, grid.points = grid.points,     grid.method = grid.method, ...)
 9: haystack.matrix(x = scvi, detection = detect, use.advanced.sampling = gd)
10: haystack(x = scvi, detection = detect, use.advanced.sampling = gd)
An irrecoverable exception occurred. R is aborting now ...
Segmentation fault

I suggest you use Matrix::colSums and Matrix::rowSum instead of apply to do sum operations as apply transforms the sparse matrix into a full matrix.

davemcg commented 4 years ago

data: http://hpc.nih.gov/~mcgaugheyd/scEiaD/2020_08_13/scEiaD_droplet_seurat_v3.Rdata

load('scEiaD_droplet_seurat_v3.Rdata')
library(tidyverse)
library(singleCellHaystack)
library(Seurat)
library(Matrix)
library(tictoc)
detect <- ( integrated_obj@assays$RNA@counts > 0)
scvi <- Embeddings(integrated_obj, reduction = 'scVI')#[cells,]
cts <- ( integrated_obj@assays$RNA@counts) #[,cells])
gd <- colSums(detect)
rm(integrated_obj)
tic()
scH <- haystack(x = scvi , detection = detect, use.advanced.sampling = gd)
toc()
alexisvdb commented 4 years ago

Hi David. Thank you for letting us know. We have followed your suggestion and replaced those apply instances by Matrix::rowSums. I have pushed those changes to the sparse branch. On my (far smaller) sparse matrix datasets this seems to work, but if you run into other issues please let us know.