cellgeni / schard

reticulate-free single cell format conversion
GNU General Public License v3.0
49 stars 3 forks source link

problem running shard #1

Closed Flu09 closed 5 months ago

Flu09 commented 5 months ago

Hello, thank you for making this tool.

I encountered the following issue and I noticed very high ram consumption

file <- "/home/sherine/brain_ref/a5463d8f-07df-4870-8cae-bc504de762c8.h5ad" snhx = schard::h5ad2seurat(file) Loading required package: Seurat Loading required package: SeuratObject Loading required package: sp ‘SeuratObject’ was built under R 4.2.0 but the current version is 4.3.2; it is recomended that you reinstall ‘SeuratObject’ as the ABI for R may have changed ‘SeuratObject’ was built with package ‘Matrix’ 1.6.3 but the current version is 1.6.5; it is recomended that you reinstall ‘SeuratObject’ as the ABI for ‘Matrix’ may have changed

Attaching package: ‘SeuratObject’

The following object is masked from ‘package:base’:

intersect

The value -2^31 was detected in the dataset. This has been converted to NA within R. Error in Matrix::sparseMatrix(i = m$indices + 1, p = m$indptr, x = as.numeric(m$data), : 'p' must be a nondecreasing vector c(0, ...) In addition: Warning message: In H5Dread(h5dataset = h5dataset, h5spaceFile = h5spaceFile, h5spaceMem = h5spaceMem, : NAs produced by integer overflow while converting 64-bit integer from HDF5 to a 32-bit integer in R. Choose bit64conversion='bit64' or bit64conversion='double' to avoid data loss

snhx <- schard::h5ad2seurat(file, bit64conversion = 'bit64') Error in schard::h5ad2seurat(file, bit64conversion = "bit64") : unused argument (bit64conversion = "bit64")

prete commented 5 months ago

Hi @Flu09 just to check, the file you're trying to convert is Human Brain Cell Atlas v1.0 - All neurons, right?

https://datasets.cellxgene.cziscience.com/a5463d8f-07df-4870-8cae-bc504de762c8.h5ad

Flu09 commented 5 months ago

yes.

iaaka commented 5 months ago

Hi @Flu09, unfortunately it seems that there is no way to load this data into R (at least into Seurat) due to its size. adata.X has 14131422385 non-zero values that is much larger than max 2^31-1 allowed by sparse matrices from Matrix package used in Seurat to store counts. It is known issue. It steams from the fact that in R largest possible integer value is 2^31-1, so it is not something easy to overcome.

Flu09 commented 5 months ago

thank you so much for your reply