Bioconductor / DelayedArray

A unified framework for working transparently with on-disk and in-memory array-like datasets
https://bioconductor.org/packages/DelayedArray
24 stars 9 forks source link

generic for deep copying #31

Closed mikejiang closed 6 years ago

mikejiang commented 6 years ago

I haven't found the specification regarding to the generic interface (e.g. clone or deep_copy generics) for implementing the deep copy of DelayedArray object with the file backend.

mikejiang commented 6 years ago

or the generic interface for copy_on_write?

hpages commented 6 years ago

Hi Mike,

There is no clone() or deep_copy() for DelayedArray objects. This is generally not needed because operations that "seem" to modify a DelayedArray object A "in-place" (e.g. dim(A) <- new_dim, A[...] <- value, or A <- log(A)) are delayed. Delayed operations never alter the original array data (i.e. the on-disk array data that the original DelayedArray object was pointing at). More generally speaking the DelayedArray package always treats the existing on-disk data as read-only. New data is written to disk only when the user triggers realization of the object (e.g. with writeHDF5Array(x), as(x, "HDF5Array"), writeTENxMatrix(x), as(x, "TENxMatrix"), realize(), etc..., see the individual man pages for the details). All these functions return a pristine DelayeArray object pointing to a new data set containing the transformed array data.

So a clone() or deep_copy() for DelayedArray objects is typically not needed. They could be added though if someone had a compelling use case for this.

Hope this makes sense, H.

hpages commented 6 years ago

Hope this answered your question. Feel free to re-open if it didn't.