gdudziuk commented 4 years ago

I have implemented a simple form of hierarchical optimization in D2D and I would like to propose to pull this feature upstream.

The method of hierarchical optimization is described in detail in (Weber et al., 2011) and (Loos et al.,2018). Let me however illustrate this method for the following simplified parameter estimation problem:

(Eq. 1) \min_{p} J_1(p) = \sum_{j=1}^{J} (Y_j - y(p,t_j))^2/{std}_j^2

where Y_j and {std}_j represent experimental measurements and associated standard deviations given or arbitrary scale and y(p,t_j) is the model observable corresponding to the measurements Y_j. Assume that the observable y is of form y(p,t)=s*x(r,t) where x(r,t) denote model species (or some derived variable) corresponding to the measurements Y_j, s is a scale parameter to be estimated along with the dynamic parameters r and p=(r,s). Then, the above optimization problem becomes

(Eq. 2) \min_{r,s} J_2(r,s) = \sum_{j=1}^{J} (Y_j - s*x(r,t_j))^2/{std}_j^2

The core observation is that the function J_2 is quadratic and positive definite w.r.t. the scale s, assuming that at least some of x(r,t_j) are nonzero. Thus, given parameters r, the optimal scale s=s(r) is uniquely determined and the optimization problem (Eq. 2) is equivalent to

(Eq. 3) \min_{r} J_3(r) = \sum_{j=1}^{J} (Y_j - s(r)*x(r,t_j))^2/{std}_j^2

The optimal scales s(r) may be expressed explicitly. To do so, it is enough to resolve the condition \grad_{s} J_2(r,s)=0 w.r.t. s, which gives:

(Eq. 4) s(r) = \frac{\sum_{j} x_(r,t_j)*Y_j/{std}_j^2}
                    {\sum_{j} x_(r,t_j)^2/{std}_j^2}

Having this, we may also explicitly compute the gradient of J_3 w.r.t. r (assuming that we can compute the sensitivities of x w.r.t r) and use it to feed any gradient-based optimization algorithm.

To sum up, thanks to the specific form of the observable y, namely y=s*x, we can replace the optimization problem (Eq. 2) with the equivalent problem (Eq. 3) which has less parameters to optimize. This is the method of hierarchical optimization. From my experience (and as one could expect), this can significantly improve the optimization speed. The trade off is that we need to assume a specific form of the observable y. Nevertheless, I think that the case of y=s*x is common enough to implement hierarchical optimization for it.

Theoretically, the method outlined above may be generalized to more complex observables, like e.g. y=s*x+b where b denotes an offset parameter. Such observables perhaps cover vast majority of practical applications. Having y=s*x+b we proceed analogously as above, namely instead of (Eq. 2) we have

(Eq. 2a) \min_{r,s,b} J_2(r,s,b) = \sum_{j=1}^{J} (Y_j - s*x(r,t_j) - b)^2/{std}_j^2

and we need to compute both the optimal scale s=s(r) and the optimal offset b=b(r). The problem with such generalization is very pragmatic - the formulas for s(r) and b(r) become much more complex than in the simple case of y=s*x discussed above. For this reason I have implemented the hierarchical optimization just for the latter simple case and to be honest I think I will not implement the generalization to y=s*x+b soon.

Usage

There is function arInitHierarchical which detects all observables of form y=s*x end enables the "hierarchical mode" of optimization. To get it work, just put arInitHierarchical before arFit (or arFitLHS or anything like that). There is also function arDisableHierarchical which switches back to the "normal mode". So an example setup with hierarchical optimization may look like this:

arInit
arLoadModel(.....)
arLoadData(.....) % Assume that at least some data has observables of form y=s*x
arCompileAll

arFit % Here we optimize in the "normal mode", i.e. we optimize J_2(r,s) as in (Eq. 2)
arInitHierarchical % Here we enter the "hierarchial mode"...
arFit  % ... and fine-tune the fit by optimizing J_3(r) as in (Eq. 3)
arDisableHierarchical % Switch back to the "normal mode"

The optimal scales s(r) are computed by function arCalcHierarchical, which is called from arCalcRes and arCalcSimu. As a result, any D2D function that internally calls arSimu can benefit from the "hierarchical mode". For example, in the following snippet

arInitHierarchical
arPlot

the scale parameters (if any) will be automatically fine-tuned prior to generating the plots.

Limitations

The implementation in this pull request strongly relies on the specific form (Eq. 2) of the optimization problem. Consequently, any D2D feature that modifies J_2 as in (Eq. 2) into something else cannot be used with the hierarchical optimization, at least until suitable generalizations are implemented. Up to what I know, these interfering features are:

Interpreting observables and data in the log10 scale
Error fitting (Edit: Even more, it is required that errors does not depend on r nor s, even implicitly; cf. the comment below)
Detection limits
The features handled in arCollectRes, namely:
- Intercondition constraints and steady state conditions
- Priors other than standard flat box
- Random effects
- User-defined residuals

With this features enabled, the current implementation of the hierarchical optimization works incorrectly since the optimal scales s(r) are in such case different than in (Eq. 4). Function arCalcHierarchical raises an error if any of these features is in use. I have done my best to make the above list complete. However, I'm a newcomer to D2D so there may be some other features that I don't know yet and that may interfere with the hierarchical optimization by modifying the merit function J_2. So if anybody finds that there are some more features like that in D2D, please let me know.

Further development

The possible directions of generalization of the method are:

Add support for the hierarchical optimization for observables with offsets, y=s*x+b, possibly with some restricting assumptions.
Add support for the hierarchical optimization in combination with the features listed in the preceding section. In this respect, some features are more cumbersome than others. Specifically, support for priors should be simple to implement.

gdudziuk commented 4 years ago

An update. As long as this implementation relies on the form (Eq. 2) of the target function, it is important that {std}_j do not depend on the dynamic parameters r nor on the scale parameter s, neither explicitly nor implicitly, e.g. via model observables. This excludes error models with relative measurement noise, e.g. std(q1,q2,s,r) = sqrt(q1^2 + q2^2*y(s,r)^2), even if the error parameters q1 and q2 are constant.

So I need to write another commit to fix this. I will enforce the use of experimental error if the hierarchical optimization is enabled. Such a solution seems restrictive but:

Error models that don't have any dependence on s nor r and which error parameters are constant always can be represented as concrete error values for particular replicates, i.e. can be represented as experimental errors. So in fact such a solution is not more restrictive than necessary.
It is also simpler to implement than detecting error definitions that has any dependence on s or r.

I will commit in a moment. (Edit: Done, cf. commit 1b7fb04c)

clemenskreutz commented 4 years ago

Dear gdudziuk. Thanks for your contribution. At a first glance, it looks very valuable. I also deeply acknowledge your efforts in documentation. We check the code during the next days.

clemenskreutz commented 4 years ago

Dear Sorry for my delayed response. I startet testing and found the following problem: You (sometimes) do not consider that there might be several models, i.e. ar.model needs an index ar.model(m). So you always need a loop for m=1:length(ar.model), ar.model(m) ...

Example: Lines 24-25 in arInitHierarchical.m: xSyms = cellfun(@sym, ar.model.x, 'UniformOutput', false); zSyms = cellfun(@sym, ar.model.z, 'UniformOutput', false);

Could you please update the code accordingly? Thanks in advance. Best, Clemens.

clemenskreutz commented 4 years ago

Another question: I did the following:

arCopyBenchmarkModels('Swa') cd Swameye_PNAS2003\ Setup arInitHierarchical arDisableHierarchical % Switch back to the "normal mode"

There is a warning that I don't understand and obviously no scaling parameters were found. Why?

` arInitHierarchical Warning: Unable to prove 'offset_tSTAT == 0'. In symengine In sym/isAlways (line 42) In arInitHierarchical (line 80) Warning: Unable to prove 'offset_pSTAT == 0'. In symengine In sym/isAlways (line 42) In arInitHierarchical (line 80) Warning: No scale parameters found for hierarchical optimization. In arInitHierarchical (line 224)

arDisableHierarchical % Switch back to the "normal mode" Error using arDisableHierarchical (line 18) Hierarchical optimization is not enabled. You have not called arInitHierarchical or there are no scale parameters suitable for hierarchical optimization in your observables.`

gdudziuk commented 4 years ago

You (sometimes) do not consider that there might be several models, i.e. ar.model needs an index ar.model(m).

Right. Fixed in commit a35790f1. Thank you for pointing this out.

In addition I have found another small issue - I will fix it this week. I will also take a look at the model of Swameye and let you know what's going on there.

gdudziuk commented 4 years ago

I have played a bit more with example models shipped with D2D and discovered some issues in my code. They are already fixed. From the model "Becker_Science2010" I have learned that D2D allows repeated measurements in the data sheets (which I should had figured out earlier; see commit 7feb387f) and from the model "Swameye_PNAS2003" I have learned that there also may be missing measurements in the data sheets (see commit 14319ec5). I hope that now hierarchical optimization works correctly with all possible data.

Concerning the warning "Unable to prove 'some_parameter == 0'.", this is a warning raised by the symbolic function isAlways, which is used in arInitHierarchical. Since this warning was occurring during normal operation of arInitHierarchical, I have decided to switch it off (see commit 050f1146).

Concerning the warning "No scale parameters found for hierarchical optimization.", this is correct since there are no such parameters in the model "Swameye_PNAS2003". At the moment, my implementation supports hierarchical optimization only for observables of form scale*xz and the model of Swameye has observables of form scale*xz + offset, which is a different case.

I suspect that this may be a problem for you. I can see that many D2D examples involve not only scales but also offsets so you probably would like to have the hierarchical optimization generalized to cover the setups involving offsets. But if this is acceptable for you to have that simple implementation as it is now, I can provide some toy model which demonstrates the use of the hierarchical optimization in the present state of development.

Of course, for the test purposes you can get hierarchical optimization work with the model of Swameye by manually removing the offset parameters from that model.

gdudziuk commented 4 years ago

So, what do you think, @clemenskreutz?

Data2Dynamics / d2d

Hierarchical optimization in D2D #147

Usage

Limitations

Further development