SCE class allows assigning a SCE as assay?

drisso / SingleCellExperiment

Clone of the Bioconductor repository for the SingleCellExperiment package, see https://bioconductor.org/packages/devel/bioc/html/SingleCellExperiment.html for the official development version.

65 stars 18 forks source link

SCE class allows assigning a SCE as assay? #50

Closed HelenaLC closed 4 years ago

HelenaLC commented 4 years ago

Not so much an issue, but some unexpected behavior I've come across by accident.

Minirepex:

library(SingleCellExperiment)
sce <- SingleCellExperiment(
    assays = list(a = diag(10)))
assay(sce, "b") <- sce

> lapply(assays(sce), class)
$a
[1] "matrix" "array" 

$b
[1] "SingleCellExperiment"
attr(,"package")
[1] "SingleCellExperiment"

Is this on purpose? If so, I apologies for the bother and wonder what would be the use case?

> sessionInfo()
R version 4.0.0 RC (2020-04-21 r78267)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.4

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods  
[9] base     

other attached packages:
 [1] SingleCellExperiment_1.10.1 SummarizedExperiment_1.18.1 DelayedArray_0.14.0        
 [4] matrixStats_0.56.0          Biobase_2.48.0              GenomicRanges_1.40.0       
 [7] GenomeInfoDb_1.24.0         IRanges_2.22.1              S4Vectors_0.26.0           
[10] BiocGenerics_0.34.0        

loaded via a namespace (and not attached):
 [1] lattice_0.20-41        bitops_1.0-6           grid_4.0.0            
 [4] zlibbioc_1.34.0        XVector_0.28.0         Matrix_1.2-18         
 [7] tools_4.0.0            RCurl_1.98-1.2         compiler_4.0.0        
[10] GenomeInfoDbData_1.2.3

drisso commented 4 years ago

Hi @HelenaLC

It is by construction (sort of), in the sense that the assays slot contains a list of matrix-like objects, i.e., objects for which basic operations like subsetting and the like are implemented. Incidentally, that includes SummarizedExperiment and SingleCellExperiment objects.

Note that this behavior is inherited from SummarizedExperiment. Continuing your example, you can have a SCE in a SE in a SCE and so on :)

inception <- SingleCellExperiment(SummarizedExperiment(sce))

Not sure about a practically relevant use case, but the general ability of storing objects into objects is sometimes useful, e.g., for putting SummarizedExperiment objects as the altExp in SingleCellExperiment.

HelenaLC commented 4 years ago

Thank you Davide, I see your point. I guess my only concern is that: Anyone writing a function that expects a S(C)E and assay argument will check for the assay to exist in assayNames (at least I'd think this is what most people do...), this will pass. And then give very weird errors when function X is run on the SCE... Here's just one

library(scater)
library(SingleCellExperiment)

sce <- SingleCellExperiment(
    assays = list(a = diag(10)))
assay(sce, "b") <- sce

plotDots(sce, 
    features = 1:5,
    exprs_values = "b")

> Error in (x > detection_limit) + 0L : 
  non-numeric argument to binary operator

LTLA commented 4 years ago

Guess I'm late to the party here.

In general, there is no good way to check whether an instance of a S4 class is "matrix-like" or not. The only way I can imagine doing so is to try to extract a row and column and see whether you get a vector of the correct length out. I would not find this pleasant, mostly because it involves a read operation that might have a lot of overhead. For example, if I tried to extract a row and my data was in a column-chunked HDF5 file, this "little check" would require me to read in the entire dataset.

If we're going to worry about irregular stuff in the assays, a far greater concern is people putting data.frames in there. While slightly less worse (as column subsetting does, at least, yield a vector), it is much more common and equally breaking with respect to code that expects matrices.

HelenaLC commented 4 years ago

Agree with all that... Why though would anyone put a data.frame there anyways? – It's interesting to me though that, all this time, I thought there were strickt limitation on what can go in assays. I just never even tried putting "weird" stuff in there. Flexibility is one thing, but I don't see the benefit in this case over limiting what can be an assay and what not...

On another note, I find it curious why rowData can have a data.frame, but colData cannot... But maybe there's reasons beyond my understanding ¯\_(ツ)_/¯

x <- SingleCellExperiment(diag(10))
rowData(x) <- data.frame(1:10) # this works
colData(x) <- data.frame(1:10) # this doesn't
> Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function ‘colData<-’ for signature ‘"SingleCellExperiment", "data.frame"’

LTLA commented 4 years ago

Flexibility is one thing, but I don't see the benefit in this case over limiting what can be an assay and what not...

Regardless of whether we want to or not, it's just that we can't. There's no way to know whether an object fulfills the "matrix-like" contract without actually trying to extract something from it, and doing so would (potentially) be an unacceptably expensive operation to put into assay()<-.

HelenaLC commented 4 years ago

Yes, I understand. The only option (I think?) would be to limit it to a set of object types- which is kind of what colData does. I mean, colData could also be anything really- as long as it has matching nrow and possibly rownames. But it can only be a DFrame... But then reducedDim: I just tried and it let's me put in another SCE as well :/ Aaanyways, I'm sure there's reasons it is how it is :)

LTLA commented 4 years ago

Ah - but you can put anything in colData. Consider:

library(SingleCellExperiment)
example(SingleCellExperiment, echo=FALSE)
colData(sce)$whee <- sce[1:100,]

The nested SCE is not an entirely happy camper, but it can be done.

Limiting the assay entries (or colData fields, or reducedDim entries) to a subset of acceptable classes would be an intolerable constraint. I can already think of three broken use cases:

Nested DataFrames within the colData in scater::addPerCellQC with flatten=FALSE.
LinearEmbeddingMatrix objects instead of ordinary matrixes in the reducedDims.
Any "non-approved" matrix represention in the assays, e.g., DelayedArray, matter.