Closed PeteHaitch closed 6 years ago
Now I think about it some more, just defining as.array,HDF5Array-method
might give us as.matrix,HDF5Matrix-method
, albeit with a tiny bit of S4 method dispatch overhead.
Hi Pete,
After some initial investigation, it seems that the as.array
and as.matrix
methods for DelayedArray objects are slowed down by R copy-on-change mechanism being triggered even when nothing is changed e.g. in situations like dimnames(ans) <- NULL
when ans
has no dimnames. It's possible to work around this by testing before modifying e.g.
new_dimnames <- .get_DelayedArray_dimnames_before_transpose(x)
if (!identical(dimnames(ans), new_dimnames))
dimnames(ans) <- new_dimnames
By doing this kind of change in several places in the as.array
and as.matrix
methods for DelayedArray objects, I think I can make them almost as fast as your as.matrix
method for HDF5Matrix objects. I prefer this over introducing a dedicated method for HDF5Matrix objects as it will benefit as.array()
and as.matrix()
on all DelayedArray objects, not just on HDF5Matrix objects.
I'll try to commit something today and will let you know.
H.
Ah, I'd forgotten that solution you'd found, @grimbough. It'd be great if that means all DelayedArray subclasses get a speed-up, @hpages
Thanks @grimbough for confirming the copying behaviour!
I added set_dim()
and set_dimnames()
to the DelayedArray package (https://github.com/Bioconductor/DelayedArray/commit/53530e22dc880d92910de975e24c2a999181f117). These are dim<-
and dimnames<-
wrappers that use a do-not-touch-if-no-op approach. By using them everywhere in place of dim<-
and dimnames<-
in DelayedArray, I get minor speedups in various places. In particular this makes as.array()
and as.matrix()
as fast as direct extraction with extract_array()
on a pristine DelayedArray object. So all is good :-)
Thanks, @hpages! Confirmed that this speed-up follows through to other DelayedArray subclasses, like fstMatrix in fstArray
Hi Hervé,
Adding an
as.matrix,HDF5Matrix-method
gives a nice little speed-up over the current method of callingas.matrix,DelayedArray-method
via inheritance:Session info
``` r devtools::session_info() #> Session info ------------------------------------------------------------- #> setting value #> version R Under development (unstable) (2017-10-30 r73642) #> system x86_64, darwin15.6.0 #> ui X11 #> language (EN) #> collate en_AU.UTF-8 #> tz America/New_York #> date 2017-11-26 #> Packages ----------------------------------------------------------------- #> package * version date source #> backports 1.1.1 2017-09-25 CRAN (R 3.5.0) #> base * 3.5.0 2017-10-31 local #> BiocGenerics * 0.25.0 2017-10-31 Bioconductor #> codetools 0.2-15 2016-10-05 CRAN (R 3.5.0) #> colorspace 1.3-2 2016-12-14 CRAN (R 3.5.0) #> compiler 3.5.0 2017-10-31 local #> datasets * 3.5.0 2017-10-31 local #> DelayedArray * 0.5.4 2017-11-24 Bioconductor #> devtools 1.13.4 2017-11-09 CRAN (R 3.5.0) #> digest 0.6.12 2017-01-27 CRAN (R 3.5.0) #> evaluate 0.10.1 2017-06-24 CRAN (R 3.5.0) #> ggplot2 2.2.1 2016-12-30 CRAN (R 3.5.0) #> graphics * 3.5.0 2017-10-31 local #> grDevices * 3.5.0 2017-10-31 local #> grid 3.5.0 2017-10-31 local #> gtable 0.2.0 2016-02-26 CRAN (R 3.5.0) #> HDF5Array * 1.7.1 2017-11-24 Bioconductor #> htmltools 0.3.6 2017-04-28 CRAN (R 3.5.0) #> IRanges * 2.13.4 2017-11-22 Bioconductor #> knitr 1.17 2017-08-10 CRAN (R 3.5.0) #> lattice 0.20-35 2017-03-25 CRAN (R 3.5.0) #> lazyeval 0.2.1 2017-10-29 CRAN (R 3.5.0) #> magrittr 1.5 2014-11-22 CRAN (R 3.5.0) #> MASS 7.3-47 2017-02-26 CRAN (R 3.5.0) #> Matrix 1.2-12 2017-11-15 CRAN (R 3.5.0) #> matrixStats * 0.52.2 2017-04-14 CRAN (R 3.5.0) #> memoise 1.1.0 2017-04-21 CRAN (R 3.5.0) #> methods * 3.5.0 2017-10-31 local #> microbenchmark * 1.4-2.1 2015-11-25 CRAN (R 3.5.0) #> multcomp 1.4-8 2017-11-08 CRAN (R 3.5.0) #> munsell 0.4.3 2016-02-13 CRAN (R 3.5.0) #> mvtnorm 1.0-6 2017-03-02 CRAN (R 3.5.0) #> parallel * 3.5.0 2017-10-31 local #> plyr 1.8.4 2016-06-08 CRAN (R 3.5.0) #> Rcpp 0.12.14 2017-11-23 CRAN (R 3.5.0) #> rhdf5 * 2.23.0 2017-10-31 Bioconductor #> rlang 0.1.4 2017-11-05 CRAN (R 3.5.0) #> rmarkdown 1.8 2017-11-17 CRAN (R 3.5.0) #> rprojroot 1.2 2017-01-16 CRAN (R 3.5.0) #> S4Vectors * 0.17.10 2017-11-23 Bioconductor #> sandwich 2.4-0 2017-07-26 CRAN (R 3.5.0) #> scales 0.5.0 2017-08-24 CRAN (R 3.5.0) #> splines 3.5.0 2017-10-31 local #> stats * 3.5.0 2017-10-31 local #> stats4 * 3.5.0 2017-10-31 local #> stringi 1.1.6 2017-11-17 CRAN (R 3.5.0) #> stringr 1.2.0 2017-02-18 CRAN (R 3.5.0) #> survival 2.41-3 2017-04-04 CRAN (R 3.5.0) #> TH.data 1.0-8 2017-01-23 CRAN (R 3.5.0) #> tibble 1.3.4 2017-08-22 CRAN (R 3.5.0) #> tools 3.5.0 2017-10-31 local #> utils * 3.5.0 2017-10-31 local #> withr 2.1.0 2017-11-01 CRAN (R 3.5.0) #> yaml 2.1.14 2016-11-12 CRAN (R 3.5.0) #> zlibbioc 1.25.0 2017-10-31 Bioconductor #> zoo 1.8-0 2017-04-12 CRAN (R 3.5.0) ```We know a HDF5Matrix doesn't have any delayed ops (otherwise it'd be a DelayedMatrix) so I think this is safe.
For the same reasons I think we could also have an
as.array,HDF5Array-method
.Cheers, Pete