Bioconductor / HDF5Array

HDF5 backend for DelayedArray objects
https://bioconductor.org/packages/HDF5Array
9 stars 13 forks source link

Add support for h5seurat object. #57

Closed yduanBioinfo closed 10 months ago

yduanBioinfo commented 11 months ago

Add a new class H5SeuratMatrix to support for h5seurat HDF5 format sparse matrix (). Because the shape/dims of h5seurat matrix is written on the attributes of data group rather than a "shape" DATASETS in data group, the matrix can not be load through "TENxMatrix" or "H5SparseMatrix".

hpages commented 11 months ago

Hi @yduanBioinfo,

The PR has many problems, but, more importantly, I'm not sure it's a good idea to introduce a new class combo (H5SeuratMatrixSeed/H5SeuratMatrix) that repeats 95% of the code of the existing TENxMatrixSeed/TENxMatrix combo. I would be more inclined to make TENxMatrixSeed() work on a h5seurat matrix by tweaking the .load_tenx_rownames() and .load_tenx_barcodes() internal helpers so that they grab the rownames and barcodes from the alternate place if they fail to find them in the standard place.

Then it would just be a matter of updating the man pages and DESCRIPTION file to clarify that TENxMatrixSeed/TENxMatrix also support the h5seurat format.

Also it would be great if you could add a small h5seurat matrix to HDF5Array/inst/extdata/ and add short examples in TENxMatrixSeed-class.Rd and TENxMatrix-class.Rd that use it.

Would you like to try that instead?

If you decide to go for it, please make sure that your new PR doesn't introduce random edits to the DESCRIPTION file like this PR does.

Thanks, H.

yduanBioinfo commented 10 months ago

Hi @hpages,

I'm agree with you that only .load_tenx_rownames() and .load_tenx_barcodes() should be edit for h5seurat support. I'll try this way latter.

Thanks, You Duan