Open PaulJonasJost opened 3 months ago
Is there a downside to coupling it to PEtab? The upcoming PEtab Result format could cover ensemble predictions as simulation experiments. This could reduce and shift this part
to create a prediction a "predictor" is currently needed, which in my experience is so individualised that I find it questionable how much work we actually save people
into a nicer/simpler format.
I agree, ensembles themselves can be completely decoupled from AMICI or predictions, and simply serve as a thin wrapper around a NumPy array of parameter vectors. Something useful for such a wrapper would be a nice way specify prediction experiments, e.g. how to tell pyPESTO to "create a new ensemble from the current ensemble that predicts a knockout experiment, by setting this parameter to zero". I guess having the ensemble be a pandas.DataFrame
could enable this, e.g.
knockout_ensemble = ensemble.copy()
knockout_ensemble["knocked_out_parameter"] = 0
re: supported simulators, if PEtab is used to specify the ensemble predictions, then we could use the petab.simulate.PetabSimulator
[1] as the base class for the simulator, such that any simulator that implements enough of the PetabSimulator
interface can be used. This base class might need some work.
Thanks for starting this discussion. To reduce complexity, I would suggest to first tackle the higher level questions. What is generated from a parameter ensemble or model ensemble? Is there a common structure that can/should be represented in pypesto? Is the current EnsemblePrediction
, PredictionResult
, PredictionConditionResult
what we want? What will be done with that? The question of support for different types of models and simulators, and where which functionality should be implemented would come further down the road for me.
Currently, an Ensemble is only considered an accumulation of vectors, implicitly assuming that general model structure is always the same.
I think this covers the main use case in pypesto already, but once there exists some concept of model in pypesto, it shouldn't be hard to support the more general case. In case of a bigger refactoring, I would preventively rename Ensemble
to ParameterEnsemble
, so a ModelEnsemble
can be introduced once required.
Is there a downside to coupling it to PEtab?
It wouldn't be usable for any non-PEtab applications. Nevertheless, it might be better to have some easy-to-use functionality coupled to PEtab, than having some practically unusable general concept. In any case, it should be made clear that it is (supposed to be) tied to PEtab.
What is generated from a parameter ensemble or model ensemble?
I'd be happy to hear more about the use cases for a model ensemble first. If it's the calibrated models from model selection, it might make more sense to move some of this to PEtab Select, e.g. s.t. a PEtab Select model ensemble can be represented by a collection of pyPESTO ParameterEnsemble
s.
I would preventively rename
Ensemble
toParameterEnsemble
, so aModelEnsemble
can be introduced once required.
:+1:
I'd be happy to hear more about the use cases for a model ensemble first.
The Petab select case was what I mainly thought about. I would think that moving it to PEtab select (or parts of it) makes sense, but would should then clarify what we understand under Ensemble, as Daniel mentioned
I would preventively rename Ensemble to ParameterEnsemble, so a ModelEnsemble can be introduced once required.
But in Petab select we would probably also need some way to create them? 🤔
What is generated from a parameter ensemble or model ensemble? Is there a common structure that can/should be represented in pypesto?
I really think that a very large portion of predictions boils down to "sbml_id" at given timepoints that might not agree with measurements under specific conditions. And I do think there can/should be a structure to represent this in pypesto.
Is the current EnsemblePrediction, PredictionResult, PredictionConditionResult what we want?
Looking at it, I feel like the PredictionResult
as a light wrapper is perfectly fine as is, one might be able to condense it by just having this as a dict {condition_id: PredictionConditionId
}. Regarding the PredictionConditionResult
: This is currently heavily tailored to Amici with the sensitivities, so not entirely sure whether we would even want all the things there.
Looking at it, I feel like the PredictionResult as a light wrapper is perfectly fine as is, one might be able to condense it by just having this as a dict {condition_id: PredictionConditionId}. Regarding the PredictionConditionResult: This is currently heavily tailored to Amici with the sensitivities, so not entirely sure whether we would even want all the things there.
Is a prediction result as a (PEtab measurements table)-like dataframe sufficient? This would make handling the predictions for e.g. plotting easier than the current implementation, at least. Then extra AMICI/PEtab-specific things can be optional columns.
entity_id | value | [optional] time | [optional] condition_id | [optional] * |
---|---|---|---|---|
species_A | 5 | 2 | cond1 | data1 |
*
model/problem-specific things provided by the simulator, e.g. PEtab dataset ID.
Then an ensemble prediction is one big dataframe with an additional vector_id
column, or a list of dataframes.
This would make handling the predictions for e.g. plotting much easier than the current implementation. Currently, all data given a specific observable and a specific experiment is retrieved like (summary
is EnsemblePrediction.prediction_summary
):
https://github.com/ICB-DCM/pyPESTO/blob/34e89b3bc88d2052ca808da43c11c57da75bec04/pypesto/visualize/sampling.py#L176-L181
Looking at it, I feel like the
PredictionResult
as a light wrapper is perfectly fine as is, one might be able to condense it by just having this as a dict {condition_id:PredictionConditionId
}. Regarding thePredictionConditionResult
: This is currently heavily tailored to Amici with the sensitivities, so not entirely sure whether we would even want all the things there.
I am not sure if there is much added value in any of those. So far, the main thing is: 1) creating a parameter ensemble, 2) running simulations and collecting some outputs, and 3) computing and visualizing some statistics. The last step is probably most easily done directly with pandas/seaborn once everything is in a properly organized dataframe. (This shouldn't exclude the option of extending the PEtab visualization functionality to allow plotting things like confidence bands based on some PEtab visualization file.)
Is a prediction result as a (PEtab measurements table)-like dataframe sufficient?
I'd say so.
This would make handling the predictions for e.g. plotting much easier than the current implementation.
Yes.
Is a prediction result as a (PEtab measurements table)-like dataframe sufficient?
We would obviously somehow need to allow for not only observables to be put there, otherwise, I think you are right, would make handling visualization much easier.
There are a multitude of Issues currently opened regarding ensembles (see for example and further context #1357, #1349, #1296, #1294, #1291). We should use this to have a general discussion on the purpose and reasonability of
Ensembles
. We should use this issue to discuss topics regarding Ensembles to pool future improvements. Here are aspects I think we should consider (based on my own opinion, open PRs and discussion), which is certainly not a complete list:General Purpose Questions for the
Ensemble
classEnsembles
in general, but implicitly need an amici model. We do support other simulators as well. It would be completely fine to only support amici ensembles, but then we should make that clear. Consideration here would be, how general we want ensembles to be or whether we should perhaps change it intoAmiciEnsemble
(which does not necessarily need a complete own module?)General purpose Questions for the "Prediction/Predictor" Class
formula f(parameters, states, observables)
instead of just them. This functionality however is not only used in ensembles, but also for example to check model fits or to explore new conditions/ "interventions". I think it might make sense to think about a Predictor Class, as it would streamline a lot of visualization tasks and facilitate model exploration (there is the amiciPredictor class, but I excluded it for the moment, as I think a more general discussion might be helpful). Things to consider here include:What use should an Ensemble class serve?
as a follow up consideration.Any thoughts on this are very welcome and also any further questions/considerations.