ICB-DCM / pyPESTO

python Parameter EStimation TOolbox
https://pypesto.readthedocs.io
BSD 3-Clause "New" or "Revised" License
206 stars 44 forks source link

General Discussion regarding Ensemble #1358

Open PaulJonasJost opened 3 months ago

PaulJonasJost commented 3 months ago

There are a multitude of Issues currently opened regarding ensembles (see for example and further context #1357, #1349, #1296, #1294, #1291). We should use this to have a general discussion on the purpose and reasonability of Ensembles. We should use this issue to discuss topics regarding Ensembles to pool future improvements. Here are aspects I think we should consider (based on my own opinion, open PRs and discussion), which is certainly not a complete list:

Any thoughts on this are very welcome and also any further questions/considerations.

dilpath commented 3 months ago

Is there a downside to coupling it to PEtab? The upcoming PEtab Result format could cover ensemble predictions as simulation experiments. This could reduce and shift this part

to create a prediction a "predictor" is currently needed, which in my experience is so individualised that I find it questionable how much work we actually save people

into a nicer/simpler format.

I agree, ensembles themselves can be completely decoupled from AMICI or predictions, and simply serve as a thin wrapper around a NumPy array of parameter vectors. Something useful for such a wrapper would be a nice way specify prediction experiments, e.g. how to tell pyPESTO to "create a new ensemble from the current ensemble that predicts a knockout experiment, by setting this parameter to zero". I guess having the ensemble be a pandas.DataFrame could enable this, e.g.

knockout_ensemble = ensemble.copy()
knockout_ensemble["knocked_out_parameter"] = 0

re: supported simulators, if PEtab is used to specify the ensemble predictions, then we could use the petab.simulate.PetabSimulator [1] as the base class for the simulator, such that any simulator that implements enough of the PetabSimulator interface can be used. This base class might need some work.

[1] https://github.com/PEtab-dev/libpetab-python/blob/90379c41611ea941b9865ba8dd724b406b7a31ef/petab/simulate.py

dweindl commented 3 months ago

Thanks for starting this discussion. To reduce complexity, I would suggest to first tackle the higher level questions. What is generated from a parameter ensemble or model ensemble? Is there a common structure that can/should be represented in pypesto? Is the current EnsemblePrediction, PredictionResult, PredictionConditionResult what we want? What will be done with that? The question of support for different types of models and simulators, and where which functionality should be implemented would come further down the road for me.

Currently, an Ensemble is only considered an accumulation of vectors, implicitly assuming that general model structure is always the same.

I think this covers the main use case in pypesto already, but once there exists some concept of model in pypesto, it shouldn't be hard to support the more general case. In case of a bigger refactoring, I would preventively rename Ensemble to ParameterEnsemble, so a ModelEnsemble can be introduced once required.

Is there a downside to coupling it to PEtab?

It wouldn't be usable for any non-PEtab applications. Nevertheless, it might be better to have some easy-to-use functionality coupled to PEtab, than having some practically unusable general concept. In any case, it should be made clear that it is (supposed to be) tied to PEtab.

dilpath commented 3 months ago

What is generated from a parameter ensemble or model ensemble?

I'd be happy to hear more about the use cases for a model ensemble first. If it's the calibrated models from model selection, it might make more sense to move some of this to PEtab Select, e.g. s.t. a PEtab Select model ensemble can be represented by a collection of pyPESTO ParameterEnsembles.

I would preventively rename Ensemble to ParameterEnsemble, so a ModelEnsemble can be introduced once required.

:+1:

PaulJonasJost commented 3 months ago

I'd be happy to hear more about the use cases for a model ensemble first.

The Petab select case was what I mainly thought about. I would think that moving it to PEtab select (or parts of it) makes sense, but would should then clarify what we understand under Ensemble, as Daniel mentioned

I would preventively rename Ensemble to ParameterEnsemble, so a ModelEnsemble can be introduced once required.

But in Petab select we would probably also need some way to create them? 🤔

What is generated from a parameter ensemble or model ensemble? Is there a common structure that can/should be represented in pypesto?

I really think that a very large portion of predictions boils down to "sbml_id" at given timepoints that might not agree with measurements under specific conditions. And I do think there can/should be a structure to represent this in pypesto.

Is the current EnsemblePrediction, PredictionResult, PredictionConditionResult what we want?

Looking at it, I feel like the PredictionResult as a light wrapper is perfectly fine as is, one might be able to condense it by just having this as a dict {condition_id: PredictionConditionId}. Regarding the PredictionConditionResult: This is currently heavily tailored to Amici with the sensitivities, so not entirely sure whether we would even want all the things there.

dilpath commented 3 months ago

Looking at it, I feel like the PredictionResult as a light wrapper is perfectly fine as is, one might be able to condense it by just having this as a dict {condition_id: PredictionConditionId}. Regarding the PredictionConditionResult: This is currently heavily tailored to Amici with the sensitivities, so not entirely sure whether we would even want all the things there.

Is a prediction result as a (PEtab measurements table)-like dataframe sufficient? This would make handling the predictions for e.g. plotting easier than the current implementation, at least. Then extra AMICI/PEtab-specific things can be optional columns.

entity_id value [optional] time [optional] condition_id [optional] *
species_A 5 2 cond1 data1

* model/problem-specific things provided by the simulator, e.g. PEtab dataset ID. Then an ensemble prediction is one big dataframe with an additional vector_id column, or a list of dataframes.

This would make handling the predictions for e.g. plotting much easier than the current implementation. Currently, all data given a specific observable and a specific experiment is retrieved like (summary is EnsemblePrediction.prediction_summary): https://github.com/ICB-DCM/pyPESTO/blob/34e89b3bc88d2052ca808da43c11c57da75bec04/pypesto/visualize/sampling.py#L176-L181

dweindl commented 3 months ago

Looking at it, I feel like the PredictionResult as a light wrapper is perfectly fine as is, one might be able to condense it by just having this as a dict {condition_id: PredictionConditionId}. Regarding the PredictionConditionResult: This is currently heavily tailored to Amici with the sensitivities, so not entirely sure whether we would even want all the things there.

I am not sure if there is much added value in any of those. So far, the main thing is: 1) creating a parameter ensemble, 2) running simulations and collecting some outputs, and 3) computing and visualizing some statistics. The last step is probably most easily done directly with pandas/seaborn once everything is in a properly organized dataframe. (This shouldn't exclude the option of extending the PEtab visualization functionality to allow plotting things like confidence bands based on some PEtab visualization file.)

Is a prediction result as a (PEtab measurements table)-like dataframe sufficient?

I'd say so.

This would make handling the predictions for e.g. plotting much easier than the current implementation.

Yes.

PaulJonasJost commented 2 months ago

Is a prediction result as a (PEtab measurements table)-like dataframe sufficient?

We would obviously somehow need to allow for not only observables to be put there, otherwise, I think you are right, would make handling visualization much easier.