Closed PeteHaitch closed 4 years ago
Works as expected if using NA_real_
or NA_integer_
but NA_character_
throws up a different error:
suppressPackageStartupMessages(library(HDF5Array))
nrow <- 3
ncol <- 4
x_all_NA <- matrix(
data = NA_character_,
nrow = nrow,
ncol = ncol,
dimnames = list(paste0("R", seq_len(nrow)), paste0("C", seq_len(ncol))))
x_all_NA
#> C1 C2 C3 C4
#> R1 NA NA NA NA
#> R2 NA NA NA NA
#> R3 NA NA NA NA
DelayedArray(x_all_NA)
#> <3 x 4> matrix of class DelayedMatrix and type "character":
#> C1 C2 C3 C4
#> R1 "NA" "NA" "NA" "NA"
#> R2 "NA" "NA" "NA" "NA"
#> R3 "NA" "NA" "NA" "NA"
writeHDF5Array(x_all_NA)
#> Warning in max(nchar(x), na.rm = TRUE): no non-missing arguments to max;
#> returning -Inf
#> Warning in H5Tset_size(tid, size): NAs introduced by coercion to integer range
#> Error in if (chunk_size > 2^32 - 1) {: missing value where TRUE/FALSE needed
Created on 2020-03-27 by the reprex package (v0.3.0)
The logical example works as expected in release but the character one is broken
suppressPackageStartupMessages(library(HDF5Array))
nrow <- 3
ncol <- 4
x_all_NA <- matrix(
data = NA,
nrow = nrow,
ncol = ncol,
dimnames = list(paste0("R", seq_len(nrow)), paste0("C", seq_len(ncol))))
x_all_NA
#> C1 C2 C3 C4
#> R1 NA NA NA NA
#> R2 NA NA NA NA
#> R3 NA NA NA NA
DelayedArray(x_all_NA)
#> <3 x 4> matrix of class DelayedMatrix and type "logical":
#> C1 C2 C3 C4
#> R1 NA NA NA NA
#> R2 NA NA NA NA
#> R3 NA NA NA NA
writeHDF5Array(x_all_NA)
#> <3 x 4> matrix of class HDF5Matrix and type "logical":
#> [,1] [,2] [,3] [,4]
#> [1,] NA NA NA NA
#> [2,] NA NA NA NA
#> [3,] NA NA NA NA
x_all_NA <- matrix(
data = NA_character_,
nrow = nrow,
ncol = ncol,
dimnames = list(paste0("R", seq_len(nrow)), paste0("C", seq_len(ncol))))
x_all_NA
#> C1 C2 C3 C4
#> R1 NA NA NA NA
#> R2 NA NA NA NA
#> R3 NA NA NA NA
DelayedArray(x_all_NA)
#> <3 x 4> matrix of class DelayedMatrix and type "character":
#> C1 C2 C3 C4
#> R1 "NA" "NA" "NA" "NA"
#> R2 "NA" "NA" "NA" "NA"
#> R3 "NA" "NA" "NA" "NA"
writeHDF5Array(x_all_NA)
#> Error in if (chunk_size > 2^32 - 1) {: missing value where TRUE/FALSE needed
Created on 2020-03-27 by the reprex package (v0.3.0)
Hi Pete,
Thanks for the detailed report.
The 1st issue (logical NA
converted to FALSE
) is a regression in the rhdf5 package:
In release (rhdf5 2.30.1):
library(rhdf5)
m <- matrix(c(FALSE, TRUE, NA, FALSE, TRUE, NA), nrow=2)
h5write(m, "test.h5", "M1")
h5read("test.h5", "M1")
# [,1] [,2] [,3]
# [1,] FALSE NA TRUE
# [2,] TRUE FALSE NA
# Warning message:
# In H5Dread(h5dataset = h5dataset, h5spaceFile = h5spaceFile, h5spaceMem = h5spaceMem, :
# integer value -2^63 replaced NA. See the section 'Large integer data types' in the 'rhdf5' vignette for more details.
In devel (rhdf5 2.31.6):
library(rhdf5)
m <- matrix(c(FALSE, TRUE, NA, FALSE, TRUE, NA), nrow=2)
h5write(m, "test.h5", "M1")
h5read("test.h5", "M1")
# [,1] [,2] [,3]
# [1,] FALSE FALSE TRUE
# [2,] TRUE FALSE FALSE
I think I know what's going on. I'll open an rhdf5 issue for this.
I'll take a look at the 2nd issue (character NA
causing error) and will let you know what I find.
H.
Thanks, @hpages!
OK, reporting the logical NA
issue is taken care of. Too late for me to continue. I'll look at the character NA
issue tomorrow... I mean later today ;-)
Kind of got busy with other stuff, sorry. So the issues with character NAs should be addressed in HDF5Array 1.15.18 and DelayedArray 0.13.12. Didn't hear back from Mike yet about the issue with logical NAs.
@PeteHaitch Mike fixed the rhdf5 issue with logical NAs in rhdf5 2.33.3. I also needed to apply a fix to HDF5Array::h5mread()
(see 41d1ddeb730ade5c815ef4e9ca978ef591fe7662).
With these fixes:
library(HDF5Array)
m <- matrix(c(FALSE, TRUE, NA, FALSE, TRUE, NA), nrow=2)
M <- writeHDF5Array(m)
M
# <2 x 3> matrix of class HDF5Matrix and type "logical":
# [,1] [,2] [,3]
# [1,] FALSE NA TRUE
# [2,] TRUE FALSE NA
One important improvement Mike did to rhdf5 since BioC 3.10 is that logical data is now stored as 8-bit instead of 32-bit integers (dtype
field):
h5ls(path(M), all=TRUE)
# group name ltype corder_valid corder cset otype
# 0 / HDF5ArrayAUTO00001 H5L_TYPE_HARD FALSE 0 0 H5I_DATASET
# num_attrs dclass dtype stype rank dim maxdim
# 0 1 INTEGER H5T_STD_I8LE SIMPLE 2 2 x 3 2 x 3
Thanks! I'll re-check my use case in the next few days and report back.
Feel free to reopen if I overlooked something.
Thanks, @hpages. It all looks good on my end
@hpages I spoke to soon or perhaps misunderstood... The 1st issue of 'Logical NA is converted to FALSE' is fixed in devel but not in the release branch.
E.g., on the release branch
suppressPackageStartupMessages(library(HDF5Array))
nrow <- 3
ncol <- 4
x_all_NA <- matrix(
data = NA,
nrow = nrow,
ncol = ncol,
dimnames = list(paste0("R", seq_len(nrow)), paste0("C", seq_len(ncol))))
x_all_NA
#> C1 C2 C3 C4
#> R1 NA NA NA NA
#> R2 NA NA NA NA
#> R3 NA NA NA NA
writeHDF5Array(x_all_NA)
#> <3 x 4> matrix of class HDF5Matrix and type "logical":
#> [,1] [,2] [,3] [,4]
#> [1,] FALSE FALSE FALSE FALSE
#> [2,] FALSE FALSE FALSE FALSE
#> [3,] FALSE FALSE FALSE FALSE
Created on 2020-06-30 by the reprex package (v0.3.0)
I caught this after re-enabling the failing tests on both the devel (https://github.com/PeteHaitch/DelayedMatrixStats/commit/5e3277b2d208015aa5ccabc72bd3db637a10e7dc) and release (https://github.com/PeteHaitch/DelayedMatrixStats/commit/3e1f667020a4b274f7862c2d8284d239dda35448) branches of DelayedMatrixStats. As of the build reports dated 2020-06-29, DelayedMatrixStats with the re-enabled tests is okay in devel (https://bioconductor.org/checkResults/3.12/bioc-LATEST/DelayedMatrixStats/) but errors in release (https://bioconductor.org/checkResults/3.11/bioc-LATEST/DelayedMatrixStats/).
The 2nd issue 'character NA causing error' seems fixed in both release and devel.
@PeteHaitch I just asked Mike if is willing to port the "NA handling" stuff to the RELEASE_3_11 branch of rhdf5.
Thanks @hpages and @grimbough for fixing in release!
suppressPackageStartupMessages(library(HDF5Array))
nrow <- 3
ncol <- 4
x_all_NA <- matrix(
data = NA,
nrow = nrow,
ncol = ncol,
dimnames = list(paste0("R", seq_len(nrow)), paste0("C", seq_len(ncol))))
x_all_NA
#> C1 C2 C3 C4
#> R1 NA NA NA NA
#> R2 NA NA NA NA
#> R3 NA NA NA NA
writeHDF5Array(x_all_NA)
#> <3 x 4> matrix of class HDF5Matrix and type "logical":
#> [,1] [,2] [,3] [,4]
#> [1,] NA NA NA NA
#> [2,] NA NA NA NA
#> [3,] NA NA NA NA
Created on 2020-07-05 by the reprex package (v0.3.0)
Discovered this when attempting to fix failing tests for DelayedMatrixStats.
Created on 2020-03-27 by the reprex package (v0.3.0)
Session info
``` r devtools::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R Under development (unstable) (2020-01-28 r77738) #> os Debian GNU/Linux 10 (buster) #> system x86_64, linux-gnu #> ui X11 #> language (EN) #> collate en_US.UTF-8 #> ctype en_US.UTF-8 #> tz Etc/UTC #> date 2020-03-27 #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date lib source #> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.0) #> backports 1.1.5 2019-10-02 [1] CRAN (R 4.0.0) #> BiocGenerics * 0.33.3 2020-03-23 [1] Bioconductor #> BiocParallel * 1.21.2 2019-12-21 [1] Bioconductor #> callr 3.4.2 2020-02-12 [1] CRAN (R 4.0.0) #> cli 2.0.2 2020-02-28 [1] CRAN (R 4.0.0) #> crayon 1.3.4 2017-09-16 [1] CRAN (R 4.0.0) #> DelayedArray * 0.13.7 2020-03-13 [1] Bioconductor #> desc 1.2.0 2018-05-01 [1] CRAN (R 4.0.0) #> devtools 2.2.2 2020-02-17 [1] CRAN (R 4.0.0) #> digest 0.6.25 2020-02-23 [1] CRAN (R 4.0.0) #> ellipsis 0.3.0 2019-09-20 [1] CRAN (R 4.0.0) #> evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.0) #> fansi 0.4.1 2020-01-08 [1] CRAN (R 4.0.0) #> fs 1.3.2 2020-03-05 [1] CRAN (R 4.0.0) #> glue 1.3.2 2020-03-12 [1] CRAN (R 4.0.0) #> HDF5Array * 1.15.13 2020-03-08 [1] Bioconductor #> highr 0.8 2019-03-20 [1] CRAN (R 4.0.0) #> htmltools 0.4.0 2019-10-04 [1] CRAN (R 4.0.0) #> IRanges * 2.21.8 2020-03-25 [1] Bioconductor #> knitr 1.28 2020-02-06 [1] CRAN (R 4.0.0) #> lattice 0.20-40 2020-02-19 [2] CRAN (R 4.0.0) #> magrittr 1.5 2014-11-22 [1] CRAN (R 4.0.0) #> Matrix 1.2-18 2019-11-27 [2] CRAN (R 4.0.0) #> matrixStats * 0.56.0 2020-03-13 [1] CRAN (R 4.0.0) #> memoise 1.1.0 2017-04-21 [1] CRAN (R 4.0.0) #> pkgbuild 1.0.6 2019-10-09 [1] CRAN (R 4.0.0) #> pkgload 1.0.2 2018-10-29 [1] CRAN (R 4.0.0) #> prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.0.0) #> processx 3.4.2 2020-02-09 [1] CRAN (R 4.0.0) #> ps 1.3.2 2020-02-13 [1] CRAN (R 4.0.0) #> R6 2.4.1 2019-11-12 [1] CRAN (R 4.0.0) #> Rcpp 1.0.4 2020-03-17 [1] CRAN (R 4.0.0) #> remotes 2.1.1 2020-02-15 [1] CRAN (R 4.0.0) #> rhdf5 * 2.31.6 2020-03-02 [1] Bioconductor #> Rhdf5lib 1.9.2 2020-02-13 [1] Bioconductor #> rlang 0.4.5 2020-03-01 [1] CRAN (R 4.0.0) #> rmarkdown 2.1 2020-01-20 [1] CRAN (R 4.0.0) #> rprojroot 1.3-2 2018-01-03 [1] CRAN (R 4.0.0) #> S4Vectors * 0.25.14 2020-03-24 [1] Bioconductor #> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.0) #> stringi 1.4.6 2020-02-17 [1] CRAN (R 4.0.0) #> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.0.0) #> testthat 2.3.2 2020-03-02 [1] CRAN (R 4.0.0) #> usethis 1.5.1 2019-07-04 [1] CRAN (R 4.0.0) #> withr 2.1.2 2018-03-15 [1] CRAN (R 4.0.0) #> xfun 0.12 2020-01-13 [1] CRAN (R 4.0.0) #> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.0) #> #> [1] /usr/local/lib/R/site-library #> [2] /usr/local/lib/R/library ```