Bioconductor / HDF5Array

HDF5 backend for DelayedArray objects
https://bioconductor.org/packages/HDF5Array
9 stars 13 forks source link

[Windows-only] Error realizing matrix with dimnames to HDF5Matrix #29

Closed PeteHaitch closed 4 years ago

PeteHaitch commented 4 years ago

I know you love these Windows-specific issues, Hervé ;)

This was (unintentionally) caught by a unit test in bsseq (v1.23.2) that was failing on tokay2. I'm sorry I only figured this out so close to the release (it took me a while to get access to a Windows VM to reproduce the Windows-specific test failure of bsseq ).

Linux

suppressPackageStartupMessages(library(HDF5Array))

M <- matrix(1:9, 3, 3)
realize(M, BACKEND = "HDF5Array")
#> <3 x 3> matrix of class HDF5Matrix and type "integer":
#>      [,1] [,2] [,3]
#> [1,]    1    4    7
#> [2,]    2    5    8
#> [3,]    3    6    9

colnames(M) <- c("A1", "A2", "A3")
realize(M, BACKEND = "HDF5Array")
#> <3 x 3> matrix of class HDF5Matrix and type "integer":
#>      A1 A2 A3
#> [1,]  1  4  7
#> [2,]  2  5  8
#> [3,]  3  6  9

Created on 2020-04-24 by the reprex package (v0.3.0)

Session info ``` r devtools::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R Under development (unstable) (2020-04-16 r78239) #> os Debian GNU/Linux 10 (buster) #> system x86_64, linux-gnu #> ui X11 #> language (EN) #> collate en_US.UTF-8 #> ctype en_US.UTF-8 #> tz Etc/UTC #> date 2020-04-24 #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date lib source #> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.1.0) #> backports 1.1.6 2020-04-05 [1] CRAN (R 4.1.0) #> BiocGenerics * 0.33.3 2020-03-23 [1] Bioconductor #> callr 3.4.3 2020-03-28 [1] CRAN (R 4.1.0) #> cli 2.0.2 2020-02-28 [1] CRAN (R 4.1.0) #> crayon 1.3.4 2017-09-16 [1] CRAN (R 4.1.0) #> DelayedArray * 0.13.12 2020-04-10 [1] Bioconductor #> desc 1.2.0 2018-05-01 [1] CRAN (R 4.1.0) #> devtools 2.3.0 2020-04-10 [1] CRAN (R 4.1.0) #> digest 0.6.25 2020-02-23 [1] CRAN (R 4.1.0) #> ellipsis 0.3.0 2019-09-20 [1] CRAN (R 4.1.0) #> evaluate 0.14 2019-05-28 [1] CRAN (R 4.1.0) #> fansi 0.4.1 2020-01-08 [1] CRAN (R 4.1.0) #> fs 1.4.1 2020-04-04 [1] CRAN (R 4.1.0) #> glue 1.4.0 2020-04-03 [1] CRAN (R 4.1.0) #> HDF5Array * 1.15.18 2020-04-10 [1] Bioconductor #> highr 0.8 2019-03-20 [1] CRAN (R 4.1.0) #> htmltools 0.4.0 2019-10-04 [1] CRAN (R 4.1.0) #> IRanges * 2.21.8 2020-03-25 [1] Bioconductor #> knitr 1.28 2020-02-06 [1] CRAN (R 4.1.0) #> lattice 0.20-41 2020-04-02 [2] CRAN (R 4.1.0) #> magrittr 1.5 2014-11-22 [1] CRAN (R 4.1.0) #> Matrix 1.2-18 2019-11-27 [2] CRAN (R 4.1.0) #> matrixStats * 0.56.0 2020-03-13 [1] CRAN (R 4.1.0) #> memoise 1.1.0 2017-04-21 [1] CRAN (R 4.1.0) #> pkgbuild 1.0.6 2019-10-09 [1] CRAN (R 4.1.0) #> pkgload 1.0.2 2018-10-29 [1] CRAN (R 4.1.0) #> prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.1.0) #> processx 3.4.2 2020-02-09 [1] CRAN (R 4.1.0) #> ps 1.3.2 2020-02-13 [1] CRAN (R 4.1.0) #> R6 2.4.1 2019-11-12 [1] CRAN (R 4.1.0) #> Rcpp 1.0.4.6 2020-04-09 [1] CRAN (R 4.1.0) #> remotes 2.1.1 2020-02-15 [1] CRAN (R 4.1.0) #> rhdf5 * 2.31.10 2020-04-02 [1] Bioconductor #> Rhdf5lib 1.9.3 2020-04-15 [1] Bioconductor #> rlang 0.4.5 2020-03-01 [1] CRAN (R 4.1.0) #> rmarkdown 2.1 2020-01-20 [1] CRAN (R 4.1.0) #> rprojroot 1.3-2 2018-01-03 [1] CRAN (R 4.1.0) #> S4Vectors * 0.25.15 2020-04-04 [1] Bioconductor #> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.1.0) #> stringi 1.4.6 2020-02-17 [1] CRAN (R 4.1.0) #> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.1.0) #> testthat 2.3.2 2020-03-02 [1] CRAN (R 4.1.0) #> usethis 1.6.0 2020-04-09 [1] CRAN (R 4.1.0) #> withr 2.2.0 2020-04-20 [1] CRAN (R 4.1.0) #> xfun 0.13 2020-04-13 [1] CRAN (R 4.1.0) #> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.1.0) #> #> [1] /usr/local/lib/R/site-library #> [2] /usr/local/lib/R/library ```

macOS

suppressPackageStartupMessages(library(HDF5Array))

M <- matrix(1:9, 3, 3)
realize(M, BACKEND = "HDF5Array")
#> <3 x 3> matrix of class HDF5Matrix and type "integer":
#>      [,1] [,2] [,3]
#> [1,]    1    4    7
#> [2,]    2    5    8
#> [3,]    3    6    9

colnames(M) <- c("A1", "A2", "A3")
realize(M, BACKEND = "HDF5Array")
#> <3 x 3> matrix of class HDF5Matrix and type "integer":
#>      A1 A2 A3
#> [1,]  1  4  7
#> [2,]  2  5  8
#> [3,]  3  6  9

Created on 2020-04-24 by the reprex package (v0.3.0)

Session info ``` r sessionInfo() #> R version 4.0.0 RC (2020-04-21 r78267) #> Platform: x86_64-apple-darwin17.0 (64-bit) #> Running under: macOS Catalina 10.15.4 #> #> Matrix products: default #> BLAS: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib #> LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib #> #> locale: #> [1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8 #> #> attached base packages: #> [1] parallel stats4 stats graphics grDevices utils datasets #> [8] methods base #> #> other attached packages: #> [1] HDF5Array_1.15.18 rhdf5_2.31.10 DelayedArray_0.13.12 #> [4] IRanges_2.21.8 S4Vectors_0.25.15 BiocGenerics_0.33.3 #> [7] matrixStats_0.56.0 #> #> loaded via a namespace (and not attached): #> [1] Rcpp_1.0.4.6 lattice_0.20-41 digest_0.6.25 grid_4.0.0 #> [5] magrittr_1.5 evaluate_0.14 highr_0.8 rlang_0.4.5 #> [9] stringi_1.4.6 Matrix_1.2-18 rmarkdown_2.1 Rhdf5lib_1.9.3 #> [13] tools_4.0.0 stringr_1.4.0 xfun_0.13 yaml_2.2.1 #> [17] compiler_4.0.0 htmltools_0.4.0 knitr_1.28 ```

Windows

suppressPackageStartupMessages(library(HDF5Array))

M <- matrix(1:9, 3, 3)
realize(M, BACKEND = "HDF5Array")
#> <3 x 3> matrix of class HDF5Matrix and type "integer":
#>      [,1] [,2] [,3]
#> [1,]    1    4    7
#> [2,]    2    5    8
#> [3,]    3    6    9

colnames(M) <- c("A1", "A2", "A3")
realize(M, BACKEND = "HDF5Array")
#> Error in .check_h5dimnames(filepath, name, h5dimnames): HDF5 dataset './.HDF5ArrayAUTO00002_dimnames/2' does not exist

Created on 2020-04-24 by the reprex package (v0.3.0)

Session info ``` r devtools::session_info() #> - Session info --------------------------------------------------------------- #> setting value #> version R version 4.0.0 RC (2020-04-22 r78280) #> os Windows 7 x64 SP 1 #> system x86_64, mingw32 #> ui RTerm #> language (EN) #> collate English_Australia.1252 #> ctype English_Australia.1252 #> tz Australia/Sydney #> date 2020-04-24 #> #> - Packages ------------------------------------------------------------------- #> package * version date lib source #> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.0) #> backports 1.1.6 2020-04-05 [1] CRAN (R 4.0.0) #> BiocGenerics * 0.33.3 2020-03-23 [1] Bioconductor #> callr 3.4.3 2020-03-28 [1] CRAN (R 4.0.0) #> cli 2.0.2 2020-02-28 [1] CRAN (R 4.0.0) #> crayon 1.3.4 2017-09-16 [1] CRAN (R 4.0.0) #> DelayedArray * 0.13.12 2020-04-16 [1] Bioconductor #> desc 1.2.0 2018-05-01 [1] CRAN (R 4.0.0) #> devtools 2.3.0 2020-04-10 [1] CRAN (R 4.0.0) #> digest 0.6.25 2020-02-23 [1] CRAN (R 4.0.0) #> ellipsis 0.3.0 2019-09-20 [1] CRAN (R 4.0.0) #> evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.0) #> fansi 0.4.1 2020-01-08 [1] CRAN (R 4.0.0) #> fs 1.4.1 2020-04-04 [1] CRAN (R 4.0.0) #> glue 1.4.0 2020-04-03 [1] CRAN (R 4.0.0) #> HDF5Array * 1.15.18 2020-04-16 [1] Bioconductor #> highr 0.8 2019-03-20 [1] CRAN (R 4.0.0) #> htmltools 0.4.0 2019-10-04 [1] CRAN (R 4.0.0) #> IRanges * 2.21.8 2020-04-16 [1] Bioconductor #> knitr 1.28 2020-02-06 [1] CRAN (R 4.0.0) #> lattice 0.20-41 2020-04-02 [2] CRAN (R 4.0.0) #> magrittr 1.5 2014-11-22 [1] CRAN (R 4.0.0) #> Matrix 1.2-18 2019-11-27 [2] CRAN (R 4.0.0) #> matrixStats * 0.56.0 2020-03-13 [1] CRAN (R 4.0.0) #> memoise 1.1.0 2017-04-21 [1] CRAN (R 4.0.0) #> pkgbuild 1.0.6 2019-10-09 [1] CRAN (R 4.0.0) #> pkgload 1.0.2 2018-10-29 [1] CRAN (R 4.0.0) #> prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.0.0) #> processx 3.4.2 2020-02-09 [1] CRAN (R 4.0.0) #> ps 1.3.2 2020-02-13 [1] CRAN (R 4.0.0) #> R6 2.4.1 2019-11-12 [1] CRAN (R 4.0.0) #> Rcpp 1.0.4.6 2020-04-09 [1] CRAN (R 4.0.0) #> remotes 2.1.1 2020-02-15 [1] CRAN (R 4.0.0) #> rhdf5 * 2.31.10 2020-04-16 [1] Bioconductor #> Rhdf5lib 1.9.3 2020-04-16 [1] Bioconductor #> rlang 0.4.5 2020-03-01 [1] CRAN (R 4.0.0) #> rmarkdown 2.1 2020-01-20 [1] CRAN (R 4.0.0) #> rprojroot 1.3-2 2018-01-03 [1] CRAN (R 4.0.0) #> S4Vectors * 0.25.15 2020-04-16 [1] Bioconductor #> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.0) #> stringi 1.4.6 2020-02-17 [1] CRAN (R 4.0.0) #> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.0.0) #> testthat 2.3.2 2020-03-02 [1] CRAN (R 4.0.0) #> usethis 1.6.0 2020-04-09 [1] CRAN (R 4.0.0) #> withr 2.2.0 2020-04-20 [1] CRAN (R 4.0.0) #> xfun 0.13 2020-04-13 [1] CRAN (R 4.0.0) #> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.0) #> #> [1] C:/Users/hickey/Documents/R/win-library/4.0 #> [2] C:/Program Files/R/R-4.0.0rc/library ```
hpages commented 4 years ago

I'm fond of them and this one is particularly tasty ;-)

I'm aware (I had some complaints on the bioc-devel mailing list about this already) and I still need to look at it. Just didn't find the time yet. Sooo much time spent on the build machines and doing package reviews... Thanks for the MRE, that is very helpful. At least now I know where to start. No more excuses!

PeteHaitch commented 4 years ago

I tried running writeHDF5Array(M, with.dimnames = TRUE) with debugonce() called as needed. I'm not sure how to best run the debugger and share the output, so I've just cut and pasted relevant looking bits from my digging around. But I think it might be Windows doing Windows things with path normalization?

Compare the h5dn in the below debugging explorations. This ultimately causes h5exists(filepath, h5dn) to return FALSE and trigger the ultimate error on Windows:

#> Error in .check_h5dimnames(filepath, name, h5dimnames): HDF5 dataset './.HDF5ArrayAUTO00002_dimnames/2' does not exist

macOS

# Within the call to h5writeDimnames()
debug: if (!is.na(group) && !h5exists(filepath, group)) h5createGroup(filepath, 
    group)
Browse[4]> h5dimnames
[1] NA                                 "//.HDF5ArrayAUTO00003_dimnames/2"

# Within the call to .check_h5dimnames()
Browse[2]> rhdf5::h5ls(filepath)
                          group                         name       otype  dclass   dim
0                             / .HDF5ArrayAUTO00004_dimnames   H5I_GROUP              
1 /.HDF5ArrayAUTO00004_dimnames                            2 H5I_DATASET  STRING     3
2                             /           HDF5ArrayAUTO00004 H5I_DATASET INTEGER 3 x 3

Browse[2]> h5dn
[1] "//.HDF5ArrayAUTO00004_dimnames/2"

Windows

# Within the call to h5writeDimnames()
debug: if (!is.na(group) && !h5exists(filepath, group)) h5createGroup(filepath, 
    group)
Browse[4]> h5dimnames
[1] NA                                 "./.HDF5ArrayAUTO00004_dimnames/2"

# Within the call to .check_h5dimnames()
Browse[2]> rhdf5::h5ls(filepath)
                          group                         name       otype  dclass   dim
0                             / .HDF5ArrayAUTO00006_dimnames   H5I_GROUP              
1 /.HDF5ArrayAUTO00006_dimnames                            2 H5I_DATASET  STRING     3
2                             /           HDF5ArrayAUTO00006 H5I_DATASET INTEGER 3 x 3

# This is different to macOS and doesn't match up with the HDF5 file contents!
Browse[2]> h5dn
[1] "./.HDF5ArrayAUTO00006_dimnames/2"
hpages commented 4 years ago

Thanks Pete for the additional details.

At the root of the problem is that strange dirname() behavior on Windows when the path has repeated leading /:

 > dirname("/foo")
[1] "/"
> dirname("//foo")
[1] "."
> dirname("///foo")
[1] "\\\\"

Looks broken to me but I'm pretty sure someone will argue that this is totally expected and sensible behavior so I won't waste my time reporting it.

Whereas on Linux or Mac it does:

> dirname("/foo")
[1] "/"
> dirname("//foo")
[1] "/"
> dirname("///foo")
[1] "/"

which is the behavior I was relying on.

A fix is on its way...

hpages commented 4 years ago

Fixed in HDF5Array 1.15.19 (commit b48652da).

PeteHaitch commented 4 years ago

Great, thanks as always!