bnprks / BPCells

Scaling Single Cell Analysis to Millions of Cells
https://bnprks.github.io/BPCells
Other
134 stars 11 forks source link

[cpp]Force the `X/indptr` to be int64 for `createAnnDataMatrix` #81

Closed ycli1995 closed 3 months ago

ycli1995 commented 3 months ago

Solve https://github.com/bnprks/BPCells/issues/76

bnprks commented 3 months ago

This looks great, thank you! Looks like a minimal set of changes that from my checks have fixed the issues.

Also from my checks, I don't think we need to perform any conversions when reading because the HDF5 library appears to do basic conversions automatically (so if we open the array as uint64_t it doesn't matter if the underlying storage was actually int64_t)

I don't think there's an easy automated test we can add, but for posterity a quick way to test if this works (as previously described in #76 and #49 comment) R code:

library(BPCells)
library(Matrix)

m <- matrix(1:12, nrow=3) |> as("dgCMatrix") |> as("IterableMatrix")
write_matrix_anndata_hdf5(m, "test_matrix.h5ad")

Python code:

import anndata as ad
adata = ad.read_h5ad( "test_matrix.h5ad")
adata.X[:2,:2]

In the old version, this will output ValueError: unsupported data types in input, and in the new version this will output the subset sparse matrix object.