Implement parameter priors

dweindl commented 5 years ago

(How to prevent over-weighting in mini-batch optimization?)

paulstapor commented 5 years ago

I think the least-biased way is weighting them with w = #(data points in current minibatch) / #(data points in full dataset) before adding the prior. That just give the original prior when using the full dataset...

dweindl commented 5 years ago

That was also suggested by Jan. Just needs some more code changes because in the current implementation of minibatch cost functions there is no notion of a "full dataset"...

paulstapor commented 5 years ago

For my understanding, an implementation of a sampling prior should be somewhere in the function write_starting_points and, ideally starting at the line 1146. Any opinion on that?

paulstapor commented 5 years ago

Okay, I'm just looking into how to implement the optimization priors. For my understanding, parPE does the following handling of optimization inputs: PEtab files --> hdf5 file --> optimization For the parameter sampling thing, everything is done by taking care of the first arrow (see post above). For the optimization, things are a bit more involved. I group here the things which I think have to be done (Please fill up this list if incomplete or correct me, if I am wrong):

Read-in of priors and writing them somehow into the hdf5 file. I think this might work somehow similar to the routine write_bounds in the HDF5DataGenerator
Read out of prior information from the hdf5 file Suggestion: Adding this readout somewhere in this function in optimizationOptions.cpp, in a way which works similar to getStartingPoint
Adding this additional information from the hdf5 file to the optimizationProblem or another object (First, it is read in as an optimizationOptions-object. But I would suggest to then transform this to some additional information in the optimizationProblem-obejct... Any strong opinion here?)
Computing the update for the cost function and the gradient based on this additional information. First idea to this: adding it in the routine OptimizationReporter::afterCostFunctionCall. But it does not really feel good... This stuff is actually not supposed to be there, and especially for mini batching, a lot of additional information might have to be passed here somehow...

dweindl commented 5 years ago

For my understanding, an implementation of a sampling prior should be somewhere in the function write_starting_points and, ideally starting at the line 1146. Any opinion on that?

Yes, the starting points should be written there. The function for the actual sampling based on the PEtab parameter file would better fit into PEtab though (also usable by pyPESTO).

dweindl commented 5 years ago

Read-in of priors and writing them somehow into the hdf5 file. I think this might work somehow similar to the routine write_bounds in the HDF5DataGenerator

Yes. We would need again the prior distribution + parameters

Read out of prior information from the hdf5 file Suggestion: Adding this readout somewhere in this function in optimizationOptions.cpp, in a way which works similar to getStartingPoint

According to the current state, it would best fit there, but it mixes too many responsibilities. But maybe start putting it there until we have a better place (see below MultiConditionDataProviderHDF5).

Adding this additional information from the hdf5 file to the optimizationProblem or another object (First, it is read in as an optimizationOptions-object. But I would suggest to then transform this to some additional information in the optimizationProblem-obejct... Any strong opinion here?)

The objective function class will not have access to OptimizationOptions, so we need to set it on the objective function instance some other way. I would suggest MultiConditionDataProviderHDF5.

Computing the update for the cost function and the gradient based on this additional information. First idea to this: adding it in the routine OptimizationReporter::afterCostFunctionCall. But it does not really feel good... This stuff is actually not supposed to be there, and especially for mini batching, a lot of additional information might have to be passed here somehow...

I would think this should be part of or called within AmiciSummedGradientFunction::evaluate.

paulstapor commented 5 years ago

Yes, the starting points should be written there. The function for the actual sampling based on the PEtab parameter file would better fit into PEtab though (also usable by pyPESTO).

Well... This is another thing. Do we want that PEtab is more than a data format and a possibility for visualization? Should it be a toolbox? I'm not entirely sure... So, I would vote for keeping this possibly out of PEtab...

dweindl commented 5 years ago

Well... This is another thing. Do we want that PEtab is more than a data format and a possibility for visualization? Should it be a toolbox? I'm not entirely sure... So, I would vote for keeping this possibly out of PEtab...

PEtab-related functions that are likely usable by other tools are already included in the PEtab library. Sampling parameters based on the specifications of the PEtab parameter table is one of those in my opinion. Otherwise an exact copy of the code will show up in pyPESTO in a week :).

paulstapor commented 5 years ago

PEtab-related functions that are likely usable by other tools are already included in the PEtab library. Sampling parameters based on the specifications of the PEtab parameter table is one of those in my opinion. Otherwise an exact copy of the code will show up in pyPESTO in a week :).

True enough, but there is already a code for this in pyPESTO... ;) And it is set up in a way that it may and should become way more mighty than the thing we might want implement into PEtab could ever be... Moreover, also parPE already has a routine to create starting points...

paulstapor commented 5 years ago

Don't get me wrong: I totally agree on implementing reusable things as much downstairs in the dependency tree as possible... I'm just saying we should be aware that we already have a double structure (parPE, pyPESTO)... And that we might now create a third one (PEtab)

dweindl commented 5 years ago

If it's preferred to have that directly in pyPESTO, then rewrite it for parPE.

I'm just saying we should be aware that we already have a double structure (parPE, pyPESTO)... And that we might now create a third one (PEtab)

The reason we created the PEtab library was exactly to not replicate that code. This is why e.g. the parameter mapping is done there and used by parPE and pyPESTO. I can't follow this argument.

paulstapor commented 5 years ago

Okay. As I said, I totally see your point, I just say it's not clear a priori.

I'll implement it in PEtab then. This will make it necessary at some point to restructure some stuff in the startpoint subpackage in pyPESTO at some point. Maybe it's less than I think right now...

ICB-DCM / parPE

Implement parameter priors #108