Open dweindl opened 2 months ago
Reminds me that that I once found this a little bit confusing. In my mind, it would make sense to treat the following explicitly separately
- definition of the objective function
This is already separated. Those are the objectivePriorType
, objectivePriorParamters
fields.
- hints for optimizer
This would be initializationPriorType
and initializationPriorParameters
, which I think need further clarification. Is this more like an optional hint, or at which stages do those have to be respected?
- hints for plotting routines
Everything for plotting is the visualization table, but this is currently independent of any prior distributions.
Those are the objectivePriorType, objectivePriorParamters fields.
I was not talking about those columns specifically, more about my experience when I was new to optimization and first came across PEtab. I was not quite sure how to cast the data I had into an objective function (should I just use least squares?), but after reading the format specification, thinking about it and reading it again, it all started to make sense - except for the two initializationPrior*
columns. I could not figure out how they affect the objection function. At some point I concluded that they are probably just there for reasons that don't affect me (remember, I was using eSS), so I just ignored them. Of course, there were also datasetId
and replicateId
, but for those it was more obvious that they are just for visualization purposes. Still, I'm not sure what (if anything) to do here. Only thing that came to my mind is prefixing optimization hint columns with oh:
, and plotting routine columns (outside the visualization table) with vis:
.
Initially,
initializationPriorType
andinitializationPriorParameters
were introduced to provide prior distributions for sampling initial points in a multi-start optimization setting. For other global optimization schemes it is less clear how this should be incorporated. Not at all? For the initial population? Whenever a new point is sampled? ...?
For me, whenever a new point is sampled in an uninformed way, then the initializationPrior*
should be used. Otherwise, it's unclear why the user can help the optimizer avoid non-evaluable regions at the start of optimization, but not during it. So, "Whenever a new point is sampled?" sounds good to me.
I also agree with @paulflang that the columns that define the objective function should be obvious, so then *Prior*
in initializationPrior*
is suboptimal. Alternative: optimizerSampling*
.
This information is useful but could also be shifted to the PEtab Result format. Currently, there are no (draft) guidelines for whether certain optimizer information is better suited in optional columns in PEtab, or as values in the PEtab Result.
For me, whenever a new point is sampled in an uninformed way, then the
initializationPrior*
should be used. Otherwise, it's unclear why the user can help the optimizer avoid non-evaluable regions at the start of optimization, but not during it. So, "Whenever a new point is sampled?" sounds good to me.
This sounds reasonable, in principle. However, my problem is, that for certain global optimizers it will be difficult to achieve that. They usually just take some box constraints and then sample randomly inside the box.
Since what is specified in the parameter table is generally considered an integral part of the optimization problem definition that can't be ignored, I am wondering whether initializationPrior*
would then rule out those optimizers. Either way is fine for me, but I think it would be good to clarify that.
Since what is specified in the parameter table is generally considered an integral part of the optimization problem definition that can't be ignored
This means we would need multiple PEtab problems, one per optimizer type (local/global). It also means one would need to use the same optimizer type to reproduce a result with the original PEtab problem -- otherwise, manual changes would be needed to have a valid PEtab problem. I guess from the perspective of PEtab users, it might be more useful to be able to specify information that an optimizer can use, without requiring it. Or, we move/copy this to the PEtab Result.
This means we would need multiple PEtab problems
:-/
I guess from the perspective of PEtab users, it might be more useful to be able to specify information that an optimizer can use, without requiring it.
Agreed, but then it should be made clear in the documentation that it's just some hint that optimizers may or may not use. Possibly also in the column name.
Or, we move/copy this to the PEtab Result.
That sounds wrong, since it clearly is input, not output. Whether it was used to obtain the given result could be included there.
Or, we move/copy this to the PEtab Result.
That sounds wrong, since it clearly is input, not output. Whether it was used to obtain the given result could be included there.
Agreed re: input vs. output. My message was coming from the perspective "the PEtab Result aims to store sufficient information for reproducibility of a result", rather than "the PEtab Result should only contain the result". i.e. it's currently planned that the PEtab Result contains inputs like optimizer hyperparameters and other tool-specific settings. But fine to leave out of this discussion.
Initially,
initializationPriorType
andinitializationPriorParameters
were introduced to provide prior distributions for sampling initial points in a multi-start optimization setting. For other global optimization schemes it is less clear how this should be incorporated. Not at all? For the initial population? Whenever a new point is sampled? ...?