Bioconductor / HDF5Array

HDF5 backend for DelayedArray objects
https://bioconductor.org/packages/HDF5Array
9 stars 13 forks source link

Can we avoid 'degrading' a HDF5Array to a DelayedArray upon subsetting? #2

Closed PeteHaitch closed 6 years ago

PeteHaitch commented 6 years ago

Say I've got a DelayedMatrix, x. I appreciate why it's necessary to degrade this to a DelayedMatrix instance upon operations such as x + 1. But it's not so clear to me why it is necessary when the operation is x[i, j].

I was thinking about this in light of #1: as.matrix(x[i, j]) would have to use as.matrix,DelayedMatrix-method even though as.matrix,HDF5Matrix-method would work out-of-the-box but for the class degrading leading to a different S4 dispatch path.

In my own use of HDF5Array, it is very often that all I'm doing on a HDF5Array instance is subsetting before calling as.array() (i.e. it's not carrying any delayed ops), so this would provide decent performance improvement.

hpages commented 6 years ago

x + 1 and x[i, j] are both delayed i.e. they return an object that carries delayed operations so the object is not in sync with its seed anymore. Degradation to DelayedArray makes this clear to the user. But it's not just a cosmetic thing: it also allows to have as(x, "HDF5Array") behave in a sensible and predictable manner i.e. it realizes x as an HDF5Array object only if it's not already an HDF5Array object. If it's an HDF5Array then as(x, "HDF5Array") is a no-op. If we allowed HDF5Array instances to carry delayed operations then we couldn't use as(x, "HDF5Array") to realize them anymore. Said otherwise, the semantic of as(x, "HDF5Array") would become fuzzy. Hope this makes sense.

PeteHaitch commented 6 years ago

Thanks, @hpages, that makes sense. Now with the speed-up from #1, this is not an issue