bnprks / BPCells

Scaling Single Cell Analysis to Millions of Cells
https://bnprks.github.io/BPCells
Other
166 stars 17 forks source link

Reading BPcells matrices with other hdf5 readers #138

Closed Artur-man closed 1 month ago

Artur-man commented 1 month ago

Hi,

Amazing package and C++ library! I was looking into the ondisk storage of the BPcells matrices. As I understand the Highfive library is used here to read/write from hdf5 files.

I wanted to see if I can open files under the designated dir. However, I couldn't open files under the matrix folder with other R based packages to manipulate h5 files such as rhdf5 and hdf5r. I am new to hdf5 and would love any input here.

library(hdf5r)
library(rhdf5)
library(BPCells)
datax_bpcells <- BPCells::write_matrix_dir(mat = mat, dir = "data/C02935B1_bin20_bpcells", overwrite = TRUE)
rhdf5::h5ls(file = "data/C02935B1_bin20_bpcells/val")
Error in H5Fopen(file, flags = flags, fapl = fapl, native = native) : 
  HDF5. File accessibility. Unable to open file.
input.file <- hdf5r::H5File$new(filename = "data/C02935B1_bin20_bpcells/val", mode = "r")
Error in H5File.open(filename, mode, file_create_pl, file_access_pl) : 
  HDF5-API Errors:
    error #000: ../../src/hdf5-1.12.2/src/H5F.c in H5Fopen(): line 620: unable to open file
        class: HDF5
        major: File accessibility
        minor: Unable to open file

    error #001: ../../src/hdf5-1.12.2/src/H5VLcallback.c in H5VL_file_open(): line 3522: open failed
        class: HDF5
        major: Virtual Object Layer
        minor: Can't open object

    error #002: ../../src/hdf5-1.12.2/src/H5VLcallback.c in H5VL__file_open(): line 3351: open failed
        class: HDF5
        major: Virtual Object Layer
        minor: Can't open object

    error #003: ../../src/hdf5-1.12.2/src/H5VLnative_file.c in H5VL__native_file_open(): line 97: unable to open file
        class: HDF5
        major: File accessibility
        minor: Unable to open file

    error #004: ../../src/hdf5-1.12.2/src/H5Fint.c in H5F_open(): line 1990: unable to read superblock
        class: HDF5
        major: File accessibility
        minor: Read failed

    error #005: 
Artur-man commented 1 month ago

Ah apologies for the confusion, just realized that this function exists write_matrix_hdf5.

rhdf5::h5createFile("data/C02935B1_bin20_bpcells.h5")
rhdf5::h5createGroup("data/C02935B1_bin20_bpcells.h5", group = "assay")
datax_bpcells <- BPCells::write_matrix_hdf5(mat = mat, 
                                            path ="data/C02935B1_bin20_bpcells.h5", 
                                            group = "assay", 
                                            overwrite = TRUE)
bnprks commented 1 month ago

Hi @Artur-man, if you want to read BPCells data from other programs, I'd also suggest you set compress=FALSE on write_matrix_hdf5, since otherwise the data will be written in a bitpacked format that most other libraries probably can't handle. Alternatively, you can use write_matrix_10x_hdf5 or write_matrix_anndata_hdf5 to write in the 10x and AnnData hdf5 formats respectively.