Bioconductor / SummarizedExperiment

A container (S4 class) for matrix-like assays
https://bioconductor.org/packages/SummarizedExperiment
34 stars 9 forks source link

`RangedSummarizedExperiment` cannot be stored in a `SummarizedExperiment`-only slot #84

Open LTLA opened 3 weeks ago

LTLA commented 3 weeks ago

Consider the following:

library(SummarizedExperiment)
setClass("FOO", slots=c(se="SummarizedExperiment"))

x <- SummarizedExperiment()
y <- new("FOO", se=x)
class(y@se)
## [1] "SummarizedExperiment"
## attr(,"package")
## [1] "SummarizedExperiment"

So far, so good. But if you try to put in a RangedSummarizedExperiment instead:

x2 <- SummarizedExperiment()
rowRanges(x2) <- GRanges()
class(x2)
## [1] "RangedSummarizedExperiment"
## attr(,"package")
## [1] "SummarizedExperiment"

y2 <- new("FOO", se=x2)
class(y2@se) # ????
## [1] "SummarizedExperiment"
## attr(,"package")
## [1] "SummarizedExperiment"

You can see how the Ranged'ness is dropped from the object when it gets stored in the FOO instance. Interestingly enough, this doesn't happen with other subclasses, even of RangedSummarizedExperiment:

library(SingleCellExperiment)
x3 <- SingleCellExperiment()
class(x3)
y3 <- new("FOO", se=x3)
class(y3@se)
## [1] "SingleCellExperiment"
## attr(,"package")
## [1] "SingleCellExperiment"

Given that it doesn't happen with other subclasses, I assume that this is a problem specific to the RSE-SE relationship. Seems that initialize,FOO-method doesn't believe that RSE extends SE and thus coerces the former to the latter.

Anyway, this is a problem encountered in actual code as altExp<- for the SingleCellExperiment package has an internal class that uses a SummarizedExperiment slot. Attempting to assign an RSE will strip away the ranged parts.

Session information ``` R Under development (unstable) (2024-10-30 r87277) Platform: aarch64-apple-darwin22.6.0 Running under: macOS Ventura 13.7 Matrix products: default BLAS: /Users/luna/Software/R/trunk/lib/libRblas.dylib LAPACK: /Users/luna/Software/R/trunk/lib/libRlapack.dylib; LAPACK version 3.12.0 locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 time zone: America/Los_Angeles tzcode source: internal attached base packages: [1] stats4 stats graphics grDevices utils datasets methods [8] base other attached packages: [1] SingleCellExperiment_1.29.0 SummarizedExperiment_1.37.0 [3] Biobase_2.67.0 GenomicRanges_1.59.0 [5] GenomeInfoDb_1.43.0 IRanges_2.41.0 [7] S4Vectors_0.45.0 BiocGenerics_0.53.0 [9] MatrixGenerics_1.19.0 matrixStats_1.4.1 loaded via a namespace (and not attached): [1] R6_2.5.1 SparseArray_1.7.0 zlibbioc_1.53.0 [4] Matrix_1.7-1 lattice_0.22-6 abind_1.4-8 [7] GenomeInfoDbData_1.2.13 S4Arrays_1.7.0 XVector_0.47.0 [10] UCSC.utils_1.3.0 grid_4.5.0 DelayedArray_0.33.0 [13] compiler_4.5.0 httr_1.4.7 tools_4.5.0 [16] crayon_1.5.3 jsonlite_1.8.9 ```
hpages commented 3 weeks ago

Yep, I had no idea either but I was made aware of this very recently. See https://github.com/Bioconductor/Contributions/issues/3616#issuecomment-2430877384

As you can see the workaround is very easy if you can't wait for a proper fix in SummarizedExperiment.

As for what's going on exactly, keep reading.

Seems to be caused by this:

rse <- SummarizedExperiment(matrix(1:12, nrow=4), rowRanges=GRanges("chr1", IRanges(1, 11:14)))
class(rse)
# [1] "RangedSummarizedExperiment"
# attr(,"package")
# [1] "SummarizedExperiment"

as(rse, "SummarizedExperiment", strict=FALSE)  # should be a no-op but it's not!
# class: SummarizedExperiment 
# dim: 4 3 
# metadata(0):
# assays(1): ''
# rownames: NULL
# rowData names(0):
# colnames: NULL
# colData names(0):

You would think that the first thing as(x, "A", strict=FALSE) does is something like if (is(x, "A")) return(x) but it doesn't :disappointed:

You would also think that new("FOO", se=x2) will just check that x2 is a SummarizedExperiment (with is(x2, "SummarizedExperiment")) and be happy it it is one, but no such luck either :disappointed: :disappointed:

At the root of the problem is that the implementation of as() assumes that we (the developers) implement coercion methods that explicitly handle the strict=FALSE case. But obviously nobody does that because (1) we were told to use setAs() to define our coerce() methods, and (2) the setAs() interface doesn't let you handle the strict=FALSE case. And nobody should be bothered to handle this case anyways because it can and should be factored out in as()'s logic in the first place!

Anyways, that's where we are. I think this can be worked around by using setMethod(coerce, ...) instead of setAs(...) to define coercion from RangedSummarizedExperiment to SummarizedExperiment. Won't be the first time I need to do something like this, e.g. see https://github.com/Bioconductor/SparseArray/blob/57bcfbf501d8656b6b5fcd151e812d68aee5c4c2/R/SVT_SparseArray-class.R#L113-L124

H.