Bioconductor / BiocClassesWorkingGroup

Notes and Discussion concerning recommended classes for Bioconductor
0 stars 0 forks source link

altExp vs MultiAssayExperiment to manage multiple assays #9

Open lgatto opened 1 year ago

lgatto commented 1 year ago

A message from Vince on slack

query on ch 12 that addresses CITEseq with ADT counts in altExp vs SingleCellMultiModal that uses MultiAssayExperiment to manage rna and adt together ... of interest to working group on classes @Laurent ?? (noted your comment on scMultiome) does classes working group have a slack channel?>

lgatto commented 1 year ago

One major difference in these two approaches is that the altExp refer to the same or related features/samples (I believe, at least), while MultiAssayExperiment can store and manage unrelated or (more often) only partially related dimensions in (possibly) different classes (SE, SCE, ExpressionSet, matrix).

I had a similar discussion/reflexion when developing the QFeatures class.

PeteHaitch commented 1 year ago

Yes, altExp refer to data collected from the exact same samples as the 'main experiment' but of a different modality. From ?altExp:

Typical examples would be for spike-in transcripts in plate-based experiments and antibody or CRISPR tags in CITE-seq experiments.

Personally, I wouldn't use an MAE for such data types and sampling because it seems like overkill to me. Plus, the Bioc single-cell tools (particularly those developed by @ltla) are very 'altExp-aware' and in my experience they tend do the right thing or provide a way to use data from the altExp when required.

LTLA commented 1 year ago

MAE is pretty unwieldy. Too much to ask users to go from SCE to MAE when almost all tooling is built around SEs.

Also performance degradations from MAE's insistence on harmonization. At best this introduces extra delayed layers on the count/logcounts, at worst it makes a copy of the underlying matrices when it reindexes the columns.

federicomarini commented 1 year ago

Joining the discussion to ask something about different sets of features, from the same samples. Namely: often (almost always) we have genes. What about transcript level information, "keeping it in" the same SCE object? And similarly: what about some quantifications, done e.g. at the pathway/functional level? These would also be very much coming forom the exact same samples.

What would be the ideal way to proceed in this? Could we/should we potentially enable these altExp to be "standard" ones?

drisso commented 1 year ago

FWIW there is a functionality in SingleCellMultiModal for the user to choose between an MAE representation and a SCE with altRep, but it's currently implemented only for CITE-seq via the DataClass argument: https://github.com/waldronlab/SingleCellMultiModal/blob/e020f9a6ba7791139fcef2260513346c7b1da7bb/vignettes/CITEseq.Rmd#L109

It could become a package-wise option, or at least an option for those protocols for which the set of cells is the same across all modalities (10x multiome, CITE-seq, ...).

I agree with @LTLA that MAE is an overkill for these protocols and SCE with altExp is typically how we analyze multiome in my group.

hpages commented 1 year ago

I meant to ask this for a while but would it make sense to move the altExp functionality to SummarizedExperiment objects so that all SE derivatives benefit from it? Are there any non single-cell analysis use cases that would benefit from this or is it too single-cell oriented?

LTLA commented 1 year ago

I meant to ask this for a while but would it make sense to move the altExp functionality to SummarizedExperiment objects so that all SE derivatives benefit from it?

Yes i would like this very much.

PeteHaitch commented 1 year ago

Are there any non single-cell analysis use cases that would benefit from this or is it too single-cell oriented?

Off the top of my-head, an example is bisulfite-sequencing which may include a spike-in sequence from another organism that is fully unmethylated (usually lambda phage), so you align and quantify against a combined reference sequence (sample + lambda_phage). The lambda phage data can be used to estimate the bisulfite conversion efficiency, so you may want to keep the resulting count matrix, but its rows don't really belong as rows of the count matrix that is used for downstream analysis. An altExp seems a natural fit for it.

federicomarini commented 1 year ago

I meant to ask this for a while but would it make sense to move the altExp functionality to SummarizedExperiment objects so that all SE derivatives benefit from it? Are there any non single-cell analysis use cases that would benefit from this or is it too single-cell oriented?

While we are thinking out loud about bringing in SCE-goodies into SE-land: why not do the same with the reducedDimension slot?

hpages commented 1 year ago

@federicomarini Is there a non single-cell analysis use case that you have in mind that would benefit from this move? What do others think? @LTLA? @PeteHaitch?

Technically we can move other SCE goodies into SE-land but I'd like to focus on those that have the highest ROI for now. This kind of move can be very disruptive and we want to minimize the disruption as much as we can. Doing it right is tricky and can be very time consuming. Won't happen in BioC 3.17: it's too late for that and I'm too busy with other things anyways. Would have to happen preferrably at the beginning of a new 6-month devel cycle e.g. at the beginning of the BioC 3.18 devel cycle.

federicomarini commented 1 year ago

@federicomarini Is there a non single-cell analysis use case that you have in mind that would benefit from this move? What do others think? @LTLA? @PeteHaitch?

Mainly the fact of having (also for bulk data) the beauty of all info in one place (read, "one object"). Probably we are spoiled now that the reduced dimensionality views are a constituent part of a single cell dataset. I found myself more than once saying "oh well, even if this is bulk and not way too many samples, I'd like to have all in here".

Agree on the timing - it is not something to do light-mindedly, and the 6 months of the next cycle could come in handy 👍

drighelli commented 11 months ago

Hi all,

I just read this issue, and it seems now on two main topics.

I'll try to give my unrequested contribution. :)