Bioconductor / SummarizedExperiment

A container (S4 class) for matrix-like assays
https://bioconductor.org/packages/SummarizedExperiment
33 stars 9 forks source link

Coercion from SummarizedExperiment to ExpressionSet creates an unlocked environment #43

Closed tomsing1 closed 4 years ago

tomsing1 commented 4 years ago

I ran into some unexpected pass-by-reference behavior with ExpressionSets that were coerced from SummarizedExperiments today.

I believe this can be traced to the SummarizedExperiment -> ExpressionSet coersion implemented in the setAs("RangedSummarizedExperiment", "ExpressionSet") method, which returns an ExpressionSet with an unlocked environment in the assayData slot, instead of the default lockedEnvironment storage mode.

(See page 7 of the BiobaseDevelopment vignette for a discussion of the storage modes.)

I believe the storageMode of the ExpressionSet object returned by the setAs("RangedSummarizedExperiment", "ExpressionSet") method needs to be locked explicitly here.

eset <- ExpressionSet(assayData,
                  phenoData = phenoData,
                  featureData = featureData,
                  experimentData = experimentData,
                  annotation = annotation,
                  protocolData = protocolData
                  )
storageMode(eset) <- "lockedEnvironment"
return(eset)

Reproducible example:

library(SummarizedExperiment)
library(airway)
data("airway")

eset <- as(airway, "ExpressionSet")
storageMode(eset)
# [1] "environment"

This can lead to unintended pass-by-reference behavior downstream, e.g.

exprs(eset)[1:3, 1:3]
                SRR1039508 SRR1039509 SRR1039512
ENSG00000000003        679        448        873
ENSG00000000005          0          0          0
ENSG00000000419        467        515        621

When I copy the ExpressionSet into a new object

eset2 <- eset
exprs(eset2)[1:3, 1:3]

the data is same as above, as expected:

                SRR1039508 SRR1039509 SRR1039512
ENSG00000000003        679        448        873
ENSG00000000005          0          0          0
ENSG00000000419        467        515        621

But when I modify the assayData in the new object:

exprs(eset2) <- log2(exprs(eset2) + 1)
exprs(eset2)[1:3, 1:3]
                SRR1039508 SRR1039509 SRR1039512
ENSG00000000003   9.409391   8.810572   9.771489
ENSG00000000005   0.000000   0.000000   0.000000
ENSG00000000419   8.870365   9.011227   9.280771

unexpectedly (at least to me), the assayData in the original ExpressionSet (eset) also gets modified:

exprs(eset)[1:3, 1:3]
                SRR1039508 SRR1039509 SRR1039512
ENSG00000000003   9.409391   8.810572   9.771489
ENSG00000000005   0.000000   0.000000   0.000000
ENSG00000000419   8.870365   9.011227   9.280771

because they both point to the same (unlocked) environment.

SessionInfo

R version 4.0.2 (2020-06-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Mojave 10.14.6

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] airway_1.8.0                SummarizedExperiment_1.18.1 DelayedArray_0.14.0         matrixStats_0.56.0         
 [5] Biobase_2.48.0              GenomicRanges_1.40.0        GenomeInfoDb_1.24.2         IRanges_2.22.2             
 [9] S4Vectors_0.26.1            BiocGenerics_0.34.0         BiocManager_1.30.10         devtools_2.3.0             
[13] usethis_1.6.1              

loaded via a namespace (and not attached):
 [1] compiler_4.0.2         XVector_0.28.0         prettyunits_1.1.1      bitops_1.0-6          
 [5] remotes_2.1.1          tools_4.0.2            zlibbioc_1.34.0        packrat_0.5.0         
 [9] testthat_2.3.2         digest_0.6.25          pkgbuild_1.0.8         pkgload_1.1.0         
[13] lattice_0.20-41        memoise_1.1.0          rlang_0.4.6            Matrix_1.2-18         
[17] cli_2.0.2              rstudioapi_0.11        GenomeInfoDbData_1.2.3 withr_2.2.0           
[21] desc_1.2.0             fs_1.4.2               grid_4.0.2             rprojroot_1.3-2       
[25] glue_1.4.1             R6_2.4.1               processx_3.4.2         fansi_0.4.1           
[29] sessioninfo_1.1.1      callr_3.4.3            magrittr_1.5           backports_1.1.8       
[33] ps_1.3.3               fortunes_1.5-4         ellipsis_0.3.1         assertthat_0.2.1      
[37] RCurl_1.98-1.2         crayon_1.3.4
vjcitn commented 4 years ago

Hi, Thomas. This is an interesting observation, and I hope the situation was not too difficult to diagnose. I don't see anything in the ExpressionSet user-level documentation indicating that the default/expected setting for ExpressionSet storageMode is lockedEnvironment. The vignette you point to indicates the available settings. You can set the storageMode manually after coercion. I don't know that the coercion function should do this, but it is worth discussing.

tomsing1 commented 4 years ago

Thanks a lot for taking a look, Vince! Yes, I agree - an ExpressionSet with an unlocked environment is totally valid. I followed your advice and set the storageMode manually.

The default for a new assayData object is a lockedEnvironment (see Biobase::assayDataNew), that's why I probably never encountered an ExpressionSet with an unlocked assay in the wild before.

Tangentially related:

When I looked up the assayDataNew method, I noticed that the switch function was called without a right-hand side for the first option (lockedEnvironment):

assayDataNew <-
    function(storage.mode = c("lockedEnvironment", "environment", "list"),
             ...)
{
    storage.mode <- match.arg(storage.mode) ## defaults to "lockedEnvironment"
    assayData <- switch(storage.mode,
                        lockedEnvironment =,
                        environment = new.env(parent=emptyenv()),
                        list = list())

I didn't even know that's valid R code! Learning something new every day...

mtmorgan commented 4 years ago

I'll update the coercion method