drisso / SingleCellExperiment

Clone of the Bioconductor repository for the SingleCellExperiment package, see https://bioconductor.org/packages/devel/bioc/html/SingleCellExperiment.html for the official development version.
63 stars 17 forks source link

Respecting any mcols() passed to reducedDims<-() if a List is supplied as the value #57

Closed brgew closed 3 years ago

brgew commented 3 years ago

I request the ability to save arbitrarily structured data in the reducedDims attribute(?) using the mcols() method. Aaron directed me here for this purpose. The relevant discussion is at the Bioconductor forum.

I am grateful for the help. Thank you!

LTLA commented 3 years ago

I think I will add a reducedDimMetadata() function, to be used kind of like this:

# Imaginary code!
reducedDimMetadata(sce, "PCA") <- list(rotation=rot, sdev=sdev)
reducedDimMetadata(sce, "UMAP") <- list(blah=stuff)

# Extract with the corresponding getter.
reducedDimMetadata(sce, "PCA")

# Get and set all metadata _en mass_
reducedDimsMetadata(sce) <- list(PCA=thing, UMAP=stuff)

This will still use the mcols() under the hood, so you wouldn't have to care about that.

LTLA commented 3 years ago

The immediate request is resolved with 5978671b77320fbe9d51fea69fef4a2a9cd0e5c2, where we respect any mcols and metadata provided in value.

However, as I was implementing this, I remembered why we put this stuff in the attributes in the first place. It is because operations like reducedDim(x, "PCA") <- value will only replace the PC matrix. It won't modify the mcols(<reducedDims>), because (i) that's a different part of the object and (ii) there's no way to know what the metadata is for an arbitrary object value. This leads to potential bugs where the PC matrix is replaced but the corresponding pieces of metadata are not. Storage of metadata in attributes avoids this problem by ensuring that replacement of value will replace both the coordinates and the associated metadata.

Similarly, there was no advantage in holding the reduced dimension metadata in the mcols(), because there is rarely any information that is consistent across different dimensionality reduction results. It's not like mcols(<GRanges>) or colData(<SE>) where you would expect to have the same field across multiple samples. Here, a PCA might have the rotation matrix and the variance explained, a t-SNE will have somthing different, etc. - there is nothing shared between entries, so it doesn't really matter if they're held together or separately, because you'll only be using them one at a time anyway.

brgew commented 3 years ago

Hi,

I appreciate your efforts to work through possible solutions to my question.

Thank you!

LTLA commented 3 years ago

To finish this discussion, I have added a reduced.dim.matrix class to help preserve interesting attributes. All you have to do is:

reducedDim(sce, "PCA") <- reduced.dim.matrix(x, sdev=1:10, rotation=rotation.matrix) # etc.

and the attributes will survive even if you subset sce. They might also survive if you cbind multiple sces, but only if the things being combined have identical attributes - otherwise those attributes are dropped with a warning.

@alanocallaghan this may be of interest to various scater functions.