Bioconductor / HDF5Array

HDF5 backend for DelayedArray objects
https://bioconductor.org/packages/HDF5Array
11 stars 13 forks source link

Possible to delay updateObject() in loadHDF5SummarizedExperiment()? #9

Closed PeteHaitch closed 6 years ago

PeteHaitch commented 6 years ago

Would it be possible to delay calling updateObject() in loadHDF5SummarizedExperiment() until after the paths of the assays have been updated? Currently, it's done immediately after the object is loaded from se.rds file. https://github.com/Bioconductor/HDF5Array/blob/81fea5bb8ab4046053ae462cc7e8902829798a0c/R/saveHDF5SummarizedExperiment.R#L115

Calling updateObject() seems to trigger the validity method. So if the object stored in se.rds has a validity method that checks the assays (e.g., I check the M and Cov assays in BSseq, https://github.com/hansenlab/bsseq/blob/refactor/R/BSseq-class.R#L12-L57), then this will fail because the assays don't yet have their paths updated to a valid location.

hpages commented 6 years ago

Delaying the call to updateObject() might introduce other problems e.g. if the serialized SummarizedExperiment object is very outdated/broken there is no guarantee that even the most basic manipulations can be performed on it. In particular I'm worried that the code inside the for loop might not always work . How about delaying validation instead? Or just disabling it?

PeteHaitch commented 6 years ago

How about delaying validation instead? Or just disabling it?

I guess that's what I really need, but I wasn't sure it was possible.

hpages commented 6 years ago

So which one, delayed validation or no validation at all? I kind of favor the latter.

FWIW I added the check argument to updateObject() in BiocGenerics a few month ago exactly for this kind of situation:

> updateObject
nonstandardGenericFunction for "updateObject" defined from package "BiocGenerics"

function (object, ..., verbose = FALSE) 
{
    result <- standardGeneric("updateObject")
    check <- list(...)$check
    if (is.null(check)) {
        check <- TRUE
    }
    else if (!isTRUEorFALSE(check)) {
        stop("'check' must be TRUE or FALSE")
    }
    if (check) 
        validObject(result)
    result
}
<bytecode: 0x375fc20>
<environment: 0x3748c88>
Methods may be defined for arguments: object
Use  showMethods("updateObject")  for currently available ones.

I didn't make it a visible argument because I was afraid that would break all the updateObject methods around but that's probably something we should do at some point.

PeteHaitch commented 6 years ago

So that's why I didn't see it when I did args(updateObject)! For my use case, delaying it until the paths of the assays have been updated is sufficient, but I can see benefits in (optionally?) not validating the object at all (faster load times, for one).

hpages commented 6 years ago

Done (commit bea6169baecc5783c497531e1d56de3246cb8262).

Note that S4Vectors provides S4Vectors:::disableValidity for disabling validation but it only turns off validity methods that were defined with setValidity2 (also provided by S4Vectors).

PeteHaitch commented 6 years ago

Thanks, Hervé! I use S4Vectors::setValidity2(), so should be safe (neat feature!)