Bioconductor / SummarizedExperiment

SummarizedExperiment container
https://bioconductor.org/packages/SummarizedExperiment
29 stars 9 forks source link

subset fails when subsetting a SummarizedExperiment object with no samples #58

Closed kpagacz closed 3 years ago

kpagacz commented 3 years ago

Environment: image

Code to reproduce:

library(MultiAssayExperiment)
no_samples <- MultiAssayExperiment::subsetByColData(miniACC, y = miniACC$gender %in% c())
subset(no_samples[["gistict"]])

output: image

I expect it to behave the same way as subsetting an empty data.frame:

subset(data.frame())  

image

This is an issue for us because in our use case we don't want to check if there are no samples in the SummarizedExperiment object.

danielinteractive commented 3 years ago

Thanks @kpagacz for posting this! @ maintainers: please let us know if you need more information or if we can help. thanks!

LiNk-NY commented 3 years ago

Hi @kpagacz

We're not sure what the use case would be for this but when there is no method specified, it defaults to subset.default.

For example:

> subset(MultiAssayExperiment())
Error in subset.default(MultiAssayExperiment()) : 
  argument "subset" is missing, with no default

We can make a change to SummarizedExperiment to print the same. SummarizedExperiment is a Vector derivative and does not have a contract to behave like a data.frame().

showClass("SummarizedExperiment")

The change would likely occur in:

getMethod("subset", "Vector")

Hervé @hpages will provide his expert opinion.

Best regards, Marcel

danielinteractive commented 3 years ago

Thanks a lot Marcel @LiNk-NY for coming back to us so quickly!

We're not sure what the use case would be for this

Just to clarify maybe: This is not only occurring when calling subset() without any other arguments. E.g. consider the above example with subset(no_samples[["gistict"]], subset = Gene.Symbol == "DIRAS3") then this fails with the same error message.

The use case is an interactive (Shiny) application where the user can filter the MAE but also the experiments (SEs). Basically we would like to make this robust such that the user could unintentionally have a subset of the MAE resulting in 0 samples, but then not breaking the app when also subsetting the experiment. Does that make sense?

LiNk-NY commented 3 years ago

subset should actually work on the columns so the syntax for the call does not seem intuitive.

From the ?subset examples, temp here is a column in the airquality data.frame:

> head(subset(airquality, Temp > 80))
   Ozone Solar.R Wind Temp Month Day
29    45     252 14.9   81     5  29
35    NA     186  9.2   84     6   4
36    NA     220  8.6   85     6   5
38    29     127  9.7   82     6   7
39    NA     273  6.9   87     6   8
40    71     291 13.8   90     6   9

Perhaps what you mean to do is:

> aa <- no_samples[["gistict"]]
> aa[mcols(aa)$Gene.Symbol == "DIRAS3", ]
class: SummarizedExperiment 
dim: 1 0 
metadata(0):
assays(1): ''
rownames(1): DIRAS3
rowData names(3): Gene.Symbol Locus.ID Cytoband
colnames(0):
colData names(0):

which uses the standard SummarizedExperiment interface, i.e., bracket subsetting. But we agree that at least the error message should be more informative.

Best, Marcel

danielinteractive commented 3 years ago

subset should actually work on the columns so the syntax for the call does not seem intuitive.

Thanks Marcel. That is strange, and I am pretty sure that the syntax is correct, because on a non-empty MAE it works:

library(MultiAssayExperiment)
no_samples <- MultiAssayExperiment::subsetByColData(miniACC, y = miniACC$gender %in% c("male"))
subset(no_samples[["gistict"]], subset = Gene.Symbol == "DIRAS3")
# correctly selects the rows / genes that fulfill the subset condition.
kpagacz commented 3 years ago

According to this doc subsetting via subset is supported and the provided examples don't suggest a no columns exception.

hpages commented 3 years ago

Hi @kpagacz @LiNk-NY ,

This actually stems from a bug in S4Vectors:::evalqForSubset() which is used in various subset() methods, including the subset() method for SummarizedExperiment objects. I just reported the issue here.

Best, H.

danielinteractive commented 3 years ago

Great, thanks @hpages !

kpagacz commented 3 years ago

Yeah, thanks a lot for digging into this!

LiNk-NY commented 3 years ago

subset should actually work on the columns so the syntax for the call does not seem intuitive.

Thanks Marcel. That is strange, and I am pretty sure that the syntax is correct, because on a non-empty MAE it works:

library(MultiAssayExperiment)
no_samples <- MultiAssayExperiment::subsetByColData(miniACC, y = miniACC$gender %in% c("male"))
subset(no_samples[["gistict"]], subset = Gene.Symbol == "DIRAS3")
# correctly selects the rows / genes that fulfill the subset condition.

I mean to point out that subset.data.frame and getMethod("subset", "Vector") do not behave the same. So I would not expect to get the same result as subset(data.frame()).

the provided examples don't suggest a no columns exception.

It's actually using the rowData / mcols for that look up. You can follow the issue in S4Vectors and close the one here. Thanks for reporting!

And thanks for looking into it Hervé!

kpagacz commented 3 years ago

Closing and hoping S4Vectors' team addresses this promptly.