Closed jmw86069 closed 2 years ago
I was thinking I could test a small R package that populates assayData as a DataFrame in the metadata slot. I would try to implement a 3-dimension subset method that would also keep the assayData in sync with the assays.
I’m not sure how to intercept adding an assay, I guess a custom function like: addSEassay(se, assaylist, assayData=NULL)
It’s tricky to add an empty row to assayData. If user supplies assayData the simplest approach would be to require it to contain the same colnames already in the se object.
Is there a similar utility to add a row to rowData or colData? I don’t remember seeing one.
I think it is a good idea to produce a small demonstration of what you are hoping for, with the extra information going into the metadata element. Define your operations as plain R functions and then we can evaluate what infrastructure changes and methods might be warranted.
I don't know how people expect to have a productive discussion, especially a technical one, on Twitter :roll_eyes:
@jmw86069 The assays()
getter returns the assays in a SimpleList which is something that can hold metadata columns (like any other Vector derivative):
library(SummarizedExperiment)
se <- SummarizedExperiment(list(A1=matrix(1:12, ncol=3), A2=matrix(101:112, ncol=3)))
assays(se)
# List of length 2
# names(2): A1 A2
class(assays(se))
# [1] "SimpleList"
# attr(,"package")
# [1] "S4Vectors"
mcols(assays(se)) <- DataFrame(assayid=c("id1", "id2"), isnormalized=c(TRUE, FALSE), otherstuff=c("X", "Y"))
mcols(assays(se))
# DataFrame with 2 rows and 3 columns
# assayid isnormalized otherstuff
# <character> <logical> <character>
# A1 id1 TRUE X
# A2 id2 FALSE Y
Is this what you are after?
Wow that's actually very helpful, thank you! @hpages
You're right about tech discussions on Twitter, but it did (eventually) get enough visibility for a response! Also, I didn't know where to ask at first. I was hoping something existed already, and at least that part was correct.
I've been a longtime user of Bioc classes, and of SummarizedExperiment. It never occurred to me that List would also have metadata columns. That's my fault.
The only little issue is that adding to assays(se) <-
does not update the mcols(assays(se))
and so it has to be done in a second step. Not a big deal, I can work with that.
For my purposes, I don't have a driving reason to request any changes to the infrastructure, I'll close this issue.
The only little issue is that adding to assays(se) <- does not update the mcols(assays(se))
Not sure what you mean by "adding to assays(se) <-
".
With assays(se) <-
assays(se) <- c(assays(se), rev(assays(se)))
mcols(assays(se))
# DataFrame with 4 rows and 3 columns
# assayid isnormalized otherstuff
# <character> <logical> <character>
# A1 id1 TRUE X
# A2 id2 FALSE Y
# A2 id2 FALSE Y
# A1 id1 TRUE X
and with assay(se, i) <-
:
assay(se, 5L) <- matrix(201:212, ncol=3)
mcols(assays(se))
# DataFrame with 5 rows and 3 columns
# assayid isnormalized otherstuff
# <character> <logical> <character>
# A1 id1 TRUE X
# A2 id2 FALSE Y
# A2 id2 FALSE Y
# A1 id1 TRUE X
# NA NA NA
Looks fine to me.
Please open a new issue and provide details if this doesn't work for you or if you were expecting something else.
H.
Yes, I should have clarified - I think current behavior is working as expected now that I understand about mcols
here. :) All good.
The "only little issue" was practical for me: adding an assay as a numeric matrix directly (shown in your second example) creates NA
values in the mcols
DataFrame. Not a problem at all, just a thing for me to handle accordingly.
I like your first example, which requires creating a SimpleList
with the new assay. I understand this is how to add a new assay with metadata:
# A. start as before
se <- SummarizedExperiment(list(A1=matrix(1:12, ncol=3),
A2=matrix(101:112, ncol=3)))
mcols(assays(se)) <- DataFrame(assayid=c("id1", "id2"),
isnormalized=c(TRUE, FALSE),
otherstuff=c("X", "Y"))
# B. new assay matrix
new_matrix <- assays(se)[[1]] + 100
# new mcols for this assay matrix
new_mcols <- DataFrame(assayid="id1_plus10",
isnormalized=TRUE,
otherstuff="Z")
# make a SimpleList for the new assay
new_assays <- SimpleList(A1_plus10=new_matrix)
# add mcols in a second step
mcols(new_assays) <- new_mcols
# C. add the assay
assays(se) <- c(assays(se), new_assays);
mcols(assays(se))
# DataFrame with 3 rows and 3 columns
# assayid isnormalized otherstuff
# <character> <logical> <character>
# A1 id1 TRUE X
# A2 id2 FALSE Y
# A1_plus10 id1_plus10 TRUE Z
The SimpleList(...)
step is interesting, there is no constructor that also includes the metadata.
For example something like SimpleList(..., mcols=DataFrame())
Do you recommend the two step process, or is there a fancier approach?
new_assays <- SimpleList(new_assay=new_assay)
mcols(new_assays) <- new_mcols
I asked on Twitter then Bioc Stack, then @vjcitn suggested I ask here. :)
“using SummarizedExperiment: I want something like assayData() to hold tabular data about each matrix in assays(), one row per assay. When storing more than one assay matrix, I encode too much into the assay name.
Has this idea been discussed?”
I basically want an empty DataFrame with slot “assayData” with one row per entry in the assays slot. (I see your post asking about assay name constraints, that could be useful or necessary here as well.)
I can add some driving use cases in the next post.
Two basic utilities: