Bioconductor / HDF5Array

HDF5 backend for DelayedArray objects
https://bioconductor.org/packages/HDF5Array
11 stars 13 forks source link

use TENxMatrix hasn't feature name #49

Closed Liripo closed 2 years ago

Liripo commented 2 years ago

Hello Developer : My file is as follows,

> h5ls("./filtered_feature_bc_matrix.h5")
              group          name       otype  dclass      dim
0                 /        matrix   H5I_GROUP                 
1           /matrix      barcodes H5I_DATASET  STRING    11952
2           /matrix          data H5I_DATASET INTEGER 27114202
3           /matrix      features   H5I_GROUP                 
4  /matrix/features _all_tag_keys H5I_DATASET  STRING        1
5  /matrix/features  feature_type H5I_DATASET  STRING    36601
6  /matrix/features        genome H5I_DATASET  STRING    36601
7  /matrix/features            id H5I_DATASET  STRING    36601
8  /matrix/features          name H5I_DATASET  STRING    36601
9           /matrix       indices H5I_DATASET INTEGER 27114202
10          /matrix        indptr H5I_DATASET INTEGER    11953
11          /matrix         shape H5I_DATASET INTEGER        2

When i read use TENxMatrix,the feature name is null:

> test <- TENxMatrix("./filtered_feature_bc_matrix.h5",group = "matrix")
> test
<36601 x 11952> sparse matrix of class TENxMatrix and type "integer":
         AAACCCAAGGCCCAAA-1 ... TTTGTTGTCTCATTAC-1
    [1,]                  0   .                  0
    [2,]                  0   .                  0
    [3,]                  0   .                  0
    [4,]                  0   .                  0
    [5,]                  0   .                  0
     ...                  .   .                  .
[36597,]                  0   .                  0
[36598,]                  0   .                  0
[36599,]                  0   .                  0
[36600,]                  0   .                  0
[36601,]                  0   .                  0
LiNk-NY commented 2 years ago

Hi @Liripo

The TENxMatrix function does not add them by default. You could add them manually by extracting the feature data from the H5 file with:

rhdf5::h5read("./filtered_feature_bc_matrix.h5", "matrix/features/id")
## or
rhdf5::h5read("./filtered_feature_bc_matrix.h5", "matrix/features/name")

I have an experimental package at https://github.com/LiNk-NY/TENxIO that you could try. Note. The package uses common Bioconductor classes, e.g., SingleCellExperiment to represent 10X data. Best, Marcel

Liripo commented 2 years ago

@LiNk-NY : Thanks for the advice and I will try your experimental package. Best liripo!

hpages commented 2 years ago

@Liripo @LiNk-NY Should be fixed in HDF5Array 1.24.2 (release) and 1.25.2 (devel).

Each new version of HDF5Array should become available via BiocManager::install() in the next 24-48 hours for BioC 3.15 and BioC 3.16 users, respectively.