drisso / SingleCellExperiment

Clone of the Bioconductor repository for the SingleCellExperiment package, see https://bioconductor.org/packages/devel/bioc/html/SingleCellExperiment.html for the official development version.
63 stars 17 forks source link

Alternative experiment getters and setters #32

Closed LTLA closed 4 years ago

LTLA commented 5 years ago

See ?altExperiment for details.

codecov-io commented 5 years ago

Codecov Report

Merging #32 into master will decrease coverage by 1.18%. The diff coverage is 87.77%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #32      +/-   ##
==========================================
- Coverage   94.76%   93.57%   -1.19%     
==========================================
  Files          15       18       +3     
  Lines         439      529      +90     
==========================================
+ Hits          416      495      +79     
- Misses         23       34      +11
Impacted Files Coverage Δ
R/SCE-misc.R 55.55% <0%> (-6.95%) :arrow_down:
R/splitSCEByAlt.R 100% <100%> (ø)
R/SEBC-methods.R 83.33% <83.33%> (ø)
R/AllGenerics.R 92.15% <87.5%> (-2.13%) :arrow_down:
R/SCE-altExperiment.R 88.88% <88.88%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 786d118...2644f68. Read the comment docs.

LTLA commented 5 years ago

it will be incumbent on downstream packages to handle the altExp,

scater and scran can start flipping stuff over as soon as we agree that this is the way to go. Visualization is a bit more annoying, scater tries to be smart about how it interprets variables but this becomes a bit stressful. For example, take colour_by="A". Is A a colData field? Or a assay row? Or an altExp row? If so, which one?

I suggest some people take this for a spin and tell me all the pain points for high-priority resolution.

But this doesn't work without upcasting citeseq to a SingleCellExperiment. So I wonder if we shouldn't be storing (or upcasting with the accessor method by default) a SingleCellExperiment for the alternative experiments?

The SCE is the default if one uses splitSCEByAlt, so most users will get that for free anyway. Not sure we need to enforce that in the class definition, as we would lose the ability to hold other things that might be interesting (e.g., a RaggedExperiment). It would only protect against the case where a user (i) is manually adding altExp<-, (ii) does not add it as an SCE, and (iii) needs SCE functionality. But even if this occurs, it's not too bad, because at the point of (iii), they can just coerce into a SCE as needed.

amcdavid commented 5 years ago

But this doesn't work without upcasting citeseq to a SingleCellExperiment. So I wonder if we shouldn't be storing (or upcasting with the accessor method by default) a SingleCellExperiment for the alternative experiments?

The SCE is the default if one uses splitSCEByAlt, so most users will get that for free anyway. Not sure we need to enforce that in the class definition, as we would lose the ability to hold other things that might be interesting (e.g., a RaggedExperiment). It would only protect against the case where a user (i) is manually adding altExp<-, (ii) does not add it as an SCE, and (iii) needs SCE functionality. But even if this occurs, it's not too bad, because at the point of (iii), they can just coerce into a SCE as needed.

Ah, I understand now that SEBC can be a SummarizedExperiment or any derived class. I suspect that this will do the trick. It would be kinda nice to have some of the maintainers of packages that revdep SingleCellExperiment (besides scater and scran) see how it would fit in their workflow before we canonize it as API.

LTLA commented 5 years ago

Between you, me, @robertamezquita and anyone else who is watching, we can do some preliminary testing to iron out the obvious problems; then I'll post something on BioC-devel for further comments; and then we'll merge. After which I can start porting scater and scran over in earnest.

LTLA commented 5 years ago

scater refactoring for altExp has begun. I'll let you know when it has finished, but that shouldn't stop people from playing around with altExps in the meantime.

LTLA commented 4 years ago

scater refactoring is, by and large, complete, see davismcc/scater#93. The altExps concept is handled by different functions in different ways, depending on what is most appropriate:

The general philosophy is that a function should only have special arguments for handling alternative experiments if it intends to make use of the fact that the data comes from an alternative experiment. If it's just using the alternative experiment as a data store, it is better for the user to call altExp() directly. For example, if we wanted to perform t-SNE on an alternative experiment, but wanted to store those results in the alternative experiment's reducedDims (not the main reducedDims), we could just do:

altExp(sce) <- runTSNE(altExp(sce))

The purpose of the alt.exp= argument is to handle information transfer between the alternative and main experiments. So, if we set alt.exp= in the runTSNE call, it would be equivalent to:

reducedDim(sce, "TSNE") <- reducedDim(runTSNE(altExp(sce)))

which we can all agree is much wordier and probably warrants a special argument.

robertamezquita commented 4 years ago

Can you clarify:

The plotting functions, via retrieveCellInfo, can color/shape/size by expression values in the alternative experiments, using the same arguments for column metadata or main assay values.

with an example? Say I have an sce that has assays logcounts, with an altExp named features with an assay inside called counts, and I want to plotReducedDim(..., 'UMAP')

(I've tried different combos for plotExpression using the exprs_values and by_exprs_values args and no dice)

LTLA commented 4 years ago

Not sure what exactly you want to do. You want to colour a UMAP off the alternative experiment?

plotReducedDim(sce, "UMAP", colour_by="whatever your alt feature has as its row name",
    by_exprs_values="counts") # this should get propagated to retrieveCellInfo().

plotExpression's features argument doesn't look inside the alternative experiments yet. Todo.

amcdavid commented 4 years ago

The general philosophy is that a function should only have special arguments for handling alternative experiments if it intends to make use of the fact that the data comes from an alternative experiment. If it's just using the alternative experiment as a data store, it is better for the user to call altExp() directly.

I think that makes sense, good plan 👍