Open cvanderaa opened 4 months ago
This also implies creating subsetting functionality for ScpModel
objects, which should be rather straightforward.
Wait, I'm confused... an ScpModel is computed on a SingleCellExperiment
. Why is the QFeatures subsetting and issue?
Also, having a quick look at the issue, isn't one problem that the sce variable wasn't created with getWithColData()
?
Sorry, you're right! It's subsetting the SingleCellExperiment
:sweat_smile:
Ok, then sub-setting the ScpModel
would be the best approach.
On the other hand, from a user's perspective, what about sub-setting features at the model results data.frames level? Sub-setting cells or cell types kind of invalidates the results, as the model was produced with all cells.
Also regarding this
the model was produced with all cells
do we want to allow sub-setting columns/cells?
It makes indeed no sense to subset an ScpModel
by cell as the estimated results depend on all cells. So when sub-setting an SCE
by column, all ScpModel
objects in the metadata()
should be removed.
what about sub-setting features at the model results data.frames level
The output tables are disconnected from the SCE
object, so it is not our responsibility and up to the user to subset the data tables as they wish. But every ScpModel
object depends on an SCE
object, so we must ensure that the functionality does not break upon subsetting, or at least explicitly forbid it. In practice, I see 3 possible approach:
scpVariance*
, scpDifferential*
, scpComponent*
) to detect whether the SCE
has been subset and throw an error, that is forbid subsettingscpVariance*
, scpDifferential*
, scpComponent*
) to detect whether the SCE
has been subset and throw an error that suggests using a function (eg scpModelUpdate()
). scpModelUpdate()
should detect changes in cells (if change, remove scpModel
) and changes in features (if change, subset scpModel
). SCE
so that is uses scpModelUpdate()
. However, I'm not convinced this is a good idea to mess up with SCE
's methods.What do you think?
The output tables are disconnected from the SCE object, so it is not our responsibility and up to the user to subset the data tables as they wish.
Of course, it not our responsibility. My point was rather that sub-setting by feature can be done on that level, so do we really need support sub-setting the SCE?
But now that I think a bit more about it, doesn't sub-setting by features invalidates some of the ScpModel results? Are the component and variance analyses still relevant? Would these be preformed on the sub-setted modelled data?
In other words, and following up on your suggestions, do we want to support sub-setting an SCE after modelling? Shouldn't we not simply warn that this will lead to broken/invalid ScpModel results, and suggest to manipulate the individual dataframe results of re-run a new model on a subset of the data?
The issues has been highlighted in: https://github.com/UCLouvain-CBIO/scp/issues/56#issuecomment-2010539916
The reason is that the subsetting functionality in
QFeatures
does not know it should also subset theScpModel
objects contained inmetadata(qfObject)
.A solution I see is to overwrite the subsetting functionality of
QFeatures
(with call tonextMethod()
), but also including a substerring of the metadata.