PEtab-dev / PEtab

PEtab - an SBML and TSV based data format for parameter estimation problems in systems biology
https://petab.readthedocs.io
MIT License
60 stars 12 forks source link

Clarify meaning of `initializationPrior*` #587

Open dweindl opened 2 months ago

dweindl commented 2 months ago

Initially, initializationPriorType and initializationPriorParameters were introduced to provide prior distributions for sampling initial points in a multi-start optimization setting. For other global optimization schemes it is less clear how this should be incorporated. Not at all? For the initial population? Whenever a new point is sampled? ...?

paulflang commented 2 months ago

Reminds me that that I once found this a little bit confusing. In my mind, it would make sense to treat the following explicitly separately

dweindl commented 2 months ago
  • definition of the objective function

This is already separated. Those are the objectivePriorType, objectivePriorParamters fields.

  • hints for optimizer

This would be initializationPriorType and initializationPriorParameters, which I think need further clarification. Is this more like an optional hint, or at which stages do those have to be respected?

  • hints for plotting routines

Everything for plotting is the visualization table, but this is currently independent of any prior distributions.

paulflang commented 2 months ago

Those are the objectivePriorType, objectivePriorParamters fields.

I was not talking about those columns specifically, more about my experience when I was new to optimization and first came across PEtab. I was not quite sure how to cast the data I had into an objective function (should I just use least squares?), but after reading the format specification, thinking about it and reading it again, it all started to make sense - except for the two initializationPrior* columns. I could not figure out how they affect the objection function. At some point I concluded that they are probably just there for reasons that don't affect me (remember, I was using eSS), so I just ignored them. Of course, there were also datasetId and replicateId, but for those it was more obvious that they are just for visualization purposes. Still, I'm not sure what (if anything) to do here. Only thing that came to my mind is prefixing optimization hint columns with oh:, and plotting routine columns (outside the visualization table) with vis:.

dilpath commented 1 month ago

Initially, initializationPriorType and initializationPriorParameters were introduced to provide prior distributions for sampling initial points in a multi-start optimization setting. For other global optimization schemes it is less clear how this should be incorporated. Not at all? For the initial population? Whenever a new point is sampled? ...?

For me, whenever a new point is sampled in an uninformed way, then the initializationPrior* should be used. Otherwise, it's unclear why the user can help the optimizer avoid non-evaluable regions at the start of optimization, but not during it. So, "Whenever a new point is sampled?" sounds good to me.

I also agree with @paulflang that the columns that define the objective function should be obvious, so then *Prior* in initializationPrior* is suboptimal. Alternative: optimizerSampling*.

This information is useful but could also be shifted to the PEtab Result format. Currently, there are no (draft) guidelines for whether certain optimizer information is better suited in optional columns in PEtab, or as values in the PEtab Result.

dweindl commented 1 month ago

For me, whenever a new point is sampled in an uninformed way, then the initializationPrior* should be used. Otherwise, it's unclear why the user can help the optimizer avoid non-evaluable regions at the start of optimization, but not during it. So, "Whenever a new point is sampled?" sounds good to me.

This sounds reasonable, in principle. However, my problem is, that for certain global optimizers it will be difficult to achieve that. They usually just take some box constraints and then sample randomly inside the box. Since what is specified in the parameter table is generally considered an integral part of the optimization problem definition that can't be ignored, I am wondering whether initializationPrior* would then rule out those optimizers. Either way is fine for me, but I think it would be good to clarify that.

dilpath commented 1 month ago

Since what is specified in the parameter table is generally considered an integral part of the optimization problem definition that can't be ignored

This means we would need multiple PEtab problems, one per optimizer type (local/global). It also means one would need to use the same optimizer type to reproduce a result with the original PEtab problem -- otherwise, manual changes would be needed to have a valid PEtab problem. I guess from the perspective of PEtab users, it might be more useful to be able to specify information that an optimizer can use, without requiring it. Or, we move/copy this to the PEtab Result.

dweindl commented 1 month ago

This means we would need multiple PEtab problems

:-/

I guess from the perspective of PEtab users, it might be more useful to be able to specify information that an optimizer can use, without requiring it.

Agreed, but then it should be made clear in the documentation that it's just some hint that optimizers may or may not use. Possibly also in the column name.

Or, we move/copy this to the PEtab Result.

That sounds wrong, since it clearly is input, not output. Whether it was used to obtain the given result could be included there.

dilpath commented 1 month ago

Or, we move/copy this to the PEtab Result.

That sounds wrong, since it clearly is input, not output. Whether it was used to obtain the given result could be included there.

Agreed re: input vs. output. My message was coming from the perspective "the PEtab Result aims to store sufficient information for reproducibility of a result", rather than "the PEtab Result should only contain the result". i.e. it's currently planned that the PEtab Result contains inputs like optimizer hyperparameters and other tool-specific settings. But fine to leave out of this discussion.