Bioconductor / HDF5Array

HDF5 backend for DelayedArray objects
https://bioconductor.org/packages/HDF5Array
9 stars 13 forks source link

Support for contiguous layout HDF5 #17

Closed soulj closed 4 years ago

soulj commented 4 years ago

I was hoping to use HDF5 files that are in contiguous layout format with HDFArray, but this doesn’t seem to be supported? It’d be great to be able to use HDFArray without having to rewrite HDF5 files in a chunked layout.

Toy example with error:

suppressMessages(library(HDF5Array))

D <- array(1L:30L,dim=c(6,5))

h5createFile("newfile.h5")
#> [1] TRUE
h5createDataset(file="newfile.h5", dataset="test", dims=c(6,5),chunk=NULL,level = 0)
#> [1] TRUE
h5write(D,file="newfile.h5",name="test")
HDF5Array(filepath ="newfile.h5",name = "test")
#> Error in h5mread(filepath, name, starts = index): H5Pget_chunk() returned an unexpected value

#chunked works
h5createFile("newfile2.h5")
#> [1] TRUE
h5createDataset(file="newfile2.h5", dataset="test", dims=c(6,5),chunk=c(6,5),level = 0)
#> [1] TRUE
h5write(D,file="newfile2.h5",name="test")
HDF5Array(filepath ="newfile2.h5",name = "test")
#> <6 x 5> HDF5Matrix object of type "double":
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]    1    7   13   19   25
#> [2,]    2    8   14   20   26
#> [3,]    3    9   15   21   27
#> [4,]    4   10   16   22   28
#> [5,]    5   11   17   23   29
#> [6,]    6   12   18   24   30

sessionInfo()
#> R version 3.6.1 (2019-07-05)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 18.10
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.3.3.so
#> 
#> locale:
#>  [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
#>  [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
#>  [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] parallel  stats4    stats     graphics  grDevices utils     datasets 
#> [8] methods   base     
#> 
#> other attached packages:
#> [1] HDF5Array_1.12.1    rhdf5_2.28.0        DelayedArray_0.10.0
#> [4] BiocParallel_1.18.0 IRanges_2.18.1      S4Vectors_0.22.0   
#> [7] BiocGenerics_0.30.0 matrixStats_0.54.0 
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_1.0.2      lattice_0.20-35 digest_0.6.20   grid_3.6.1     
#>  [5] magrittr_1.5    evaluate_0.14   highr_0.8       stringi_1.4.3  
#>  [9] Matrix_1.2-14   rmarkdown_1.14  Rhdf5lib_1.6.0  tools_3.6.1    
#> [13] stringr_1.4.0   xfun_0.8        yaml_2.2.0      compiler_3.6.1 
#> [17] htmltools_0.3.6 knitr_1.23

Created on 2019-08-05 by the reprex package (v0.3.0)

hpages commented 4 years ago

This should be addressed in HDF5Array 1.13.5 (commit 4388a802201e5f727ff8ec3cda134d308e2a9ccb).

The change was ported to HDF5Array 1.12.2 (release version).

These new versions of HDF5Array should both become available via BiocManager::install() in the next couple of days.

Cheers, H.

soulj commented 4 years ago

Thanks, works fine now with 1.12.2.

hpages commented 4 years ago

Thanks for letting me know. Closing this now.