UCLouvain-CBIO / scp

Single cell proteomics data processing
https://uclouvain-cbio.github.io/scp/index.html
19 stars 2 forks source link

Subsetting a `SingleCellExperiment` breaks the `ScpModel` object #58

Open cvanderaa opened 4 months ago

cvanderaa commented 4 months ago

The issues has been highlighted in: https://github.com/UCLouvain-CBIO/scp/issues/56#issuecomment-2010539916

The reason is that the subsetting functionality in QFeatures does not know it should also subset the ScpModel objects contained in metadata(qfObject).

A solution I see is to overwrite the subsetting functionality of QFeatures (with call to nextMethod()), but also including a substerring of the metadata.

cvanderaa commented 4 months ago

This also implies creating subsetting functionality for ScpModel objects, which should be rather straightforward.

lgatto commented 4 months ago

Wait, I'm confused... an ScpModel is computed on a SingleCellExperiment. Why is the QFeatures subsetting and issue?

Also, having a quick look at the issue, isn't one problem that the sce variable wasn't created with getWithColData()?

cvanderaa commented 4 months ago

Sorry, you're right! It's subsetting the SingleCellExperiment :sweat_smile:

lgatto commented 4 months ago

Ok, then sub-setting the ScpModel would be the best approach.

On the other hand, from a user's perspective, what about sub-setting features at the model results data.frames level? Sub-setting cells or cell types kind of invalidates the results, as the model was produced with all cells.

lgatto commented 4 months ago

Also regarding this

the model was produced with all cells

do we want to allow sub-setting columns/cells?

cvanderaa commented 4 months ago

It makes indeed no sense to subset an ScpModel by cell as the estimated results depend on all cells. So when sub-setting an SCE by column, all ScpModel objects in the metadata() should be removed.

what about sub-setting features at the model results data.frames level

The output tables are disconnected from the SCE object, so it is not our responsibility and up to the user to subset the data tables as they wish. But every ScpModel object depends on an SCE object, so we must ensure that the functionality does not break upon subsetting, or at least explicitly forbid it. In practice, I see 3 possible approach:

  1. Add a check (in scpVariance*, scpDifferential*, scpComponent*) to detect whether the SCE has been subset and throw an error, that is forbid subsetting
  2. Add a check (in scpVariance*, scpDifferential*, scpComponent*) to detect whether the SCE has been subset and throw an error that suggests using a function (eg scpModelUpdate()). scpModelUpdate() should detect changes in cells (if change, remove scpModel) and changes in features (if change, subset scpModel).
  3. Overwrite sub-setting method of SCE so that is uses scpModelUpdate(). However, I'm not convinced this is a good idea to mess up with SCE's methods.

What do you think?

lgatto commented 4 months ago

The output tables are disconnected from the SCE object, so it is not our responsibility and up to the user to subset the data tables as they wish.

Of course, it not our responsibility. My point was rather that sub-setting by feature can be done on that level, so do we really need support sub-setting the SCE?

But now that I think a bit more about it, doesn't sub-setting by features invalidates some of the ScpModel results? Are the component and variance analyses still relevant? Would these be preformed on the sub-setted modelled data?

lgatto commented 4 months ago

In other words, and following up on your suggestions, do we want to support sub-setting an SCE after modelling? Shouldn't we not simply warn that this will lead to broken/invalid ScpModel results, and suggest to manipulate the individual dataframe results of re-run a new model on a subset of the data?