Bioconductor / SummarizedExperiment

A container (S4 class) for matrix-like assays
https://bioconductor.org/packages/SummarizedExperiment
33 stars 9 forks source link

Remove all dimnames in `combine_assays_by` to avoid constructor check failures #65

Closed LTLA closed 2 years ago

LTLA commented 2 years ago

I can't remember why I decided to be so selective at:

https://github.com/Bioconductor/SummarizedExperiment/blob/71872cc03b7c0195fb80d1d09409243f049ebb3f/R/combine-methods.R#L143-L149

but it's probably better to just wipe both sets of dimnames in all scenarios, rather than just the dimnames along the non-combined dimension. There's no guarantee that either of the dimnames of the combined object will be the same as the corresponding dimnames of the SE, given that (i) we call the assays getter with withDimnames=FALSE and (ii) we may add any number of unnamed ConstantArrays into the mix.

My proposal is to just replace the code chunk above with:

    for (x in seq_along(combined)) {
        dimnames(combined[[x]]) <- NULL
    }
    combined

or anything that wipes the names off the combined assays. Local testing indicates that unit tests should be happy with this.

hpages commented 2 years ago

But why bother removing them at all now that the dimnames check can be skipped at construction time? (see issue #57). The only visible difference would be when someone does something like this:

m1 <- matrix(1:20, nrow=5, ncol=4, dimnames=list(LETTERS[1:5], letters[1:4]))
m2 <- matrix(1001:1030, nrow=6, ncol=5, dimnames=list(LETTERS[21:26], letters[2:6]))
se1 <- SummarizedExperiment(m1)
se2 <- SummarizedExperiment(m2)
se <- combineRows(se1, se2)
assay(se, withDimnames=FALSE)

Right now we get:

<11 x 6> matrix of class DelayedMatrix and type "integer":
  [,1] [,2] [,3] [,4] [,5] [,6]
A    1    6   11   16   NA   NA
B    2    7   12   17   NA   NA
C    3    8   13   18   NA   NA
D    4    9   14   19   NA   NA
E    5   10   15   20   NA   NA
U   NA 1001 1007 1013 1019 1025
V   NA 1002 1008 1014 1020 1026
W   NA 1003 1009 1015 1021 1027
X   NA 1004 1010 1016 1022 1028
Y   NA 1005 1011 1017 1023 1029
Z   NA 1006 1012 1018 1024 1030

With your proposed change above we would get:

<11 x 6> matrix of class DelayedMatrix and type "integer":
      [,1] [,2] [,3] [,4] [,5] [,6]
 [1,]    1    6   11   16   NA   NA
 [2,]    2    7   12   17   NA   NA
 [3,]    3    8   13   18   NA   NA
 [4,]    4    9   14   19   NA   NA
 [5,]    5   10   15   20   NA   NA
 [6,]   NA 1001 1007 1013 1019 1025
 [7,]   NA 1002 1008 1014 1020 1026
 [8,]   NA 1003 1009 1015 1021 1027
 [9,]   NA 1004 1010 1016 1022 1028
[10,]   NA 1005 1011 1017 1023 1029
[11,]   NA 1006 1012 1018 1024 1030

And if we didn't touch the dimnames at all we would get:

<11 x 6> matrix of class DelayedMatrix and type "integer":
     a    b    c    d          
A    1    6   11   16   NA   NA
B    2    7   12   17   NA   NA
C    3    8   13   18   NA   NA
D    4    9   14   19   NA   NA
E    5   10   15   20   NA   NA
U   NA 1001 1007 1013 1019 1025
V   NA 1002 1008 1014 1020 1026
W   NA 1003 1009 1015 1021 1027
X   NA 1004 1010 1016 1022 1028
Y   NA 1005 1011 1017 1023 1029
Z   NA 1006 1012 1018 1024 1030

BTW I was not sure whether combineRows()/combineCols() supported assays with more than 2 dimensions so I tried the following but could not really make sense of the error message (the assays to bind do have the same number of dimensions):

a1 <- array(1:60, 5:3, dimnames=list(LETTERS[1:5], letters[1:4], NULL))
a2 <- array(1001:1072, c(6,5,3), dimnames=list(LETTERS[21:26], letters[2:6], NULL))
se1 <- SummarizedExperiment(a1)
se2 <- SummarizedExperiment(a2)
combineRows(se1, se2)
# Error in validObject(.Object) : invalid class “DelayedAbind” object: 
#     all the objects to bind must have the same number of dimensions
LTLA commented 2 years ago

But why bother removing them at all now that the dimnames check can be skipped at construction time?

Yes, that would also be fine. In fact, it would be better, because it would preserve the assay dimnames as extracted by withDimnames=FALSE. I just went for the more conservative change in my proposal.

BTW I was not sure whether combineRows()/combineCols() supported assays with more than 2 dimensions so I tried the following but could not really make sense of the error message (the assays to bind do have the same number of dimensions):

I'm not sure either. My best guess would be that we'd need to switch the cbind/rbind calls with acbind and arbind.

hpages commented 2 years ago

Ok. Done in SummarizedExperiment 1.27.3