TTitscher commented 2 years ago

Problem

This picture compares a sampled posterior (blue) with the VB posterior (red) and I assume the sampled results to be correct. E_beam and E_conn are MVN model parameters, Lambda is the Gamma noise precision.

The individual parameters are not normally distributed.
By definition, the MVN posterior cannot capture the curved covariance.

This boils down to the fact that a for a nonlinear model the prior and the posterior are not conjugate anymore (right?!). But even though our model is indeed nonlinear, the linearization around a narrow posterior distribution may still by quite linear. So I would like to provide some (optional) linearity checks that warn the user if the linearization assumption is bad at the posterior mean. This matters, if you want to evaluate the model at the tails of the posterior distribution, as the posterior MVN may significantly differ from the true posterior (see figure).

Goal

Define and quantify good and bad in the paragraph above.
Very optional: Derive suggestions for parameter transformations like The linearization of 'E_beam' of has a linearity of 0.2 (bad!), the parametrization '1/E_beam' would have a linearity of 0.96 (good).

Possible solutions

The basis could be a comparison of our model error function M at some point with the posterior-linearized M_lin. For a scalar model error with one parameter theta, we could take a set of points (e.g. theta_i = mean + i * sd for i [-4, -3, ..., 4]) and compare M(theta_i) [as a data set] and M_lin(theta_i) [as our prediction] using R².

How do deal with multiple parameters?
- Doing it for each parameter individually could be sufficient.
The model error is vector valued. How do deal with that?
- We could compute R² for entry individually. If each entry has a meaning (e.g. sensor 4 at time step 17) that could be indeed interesting to see. Or we return some mean(R²).
- Is there something like R² for higher dimensional data. Maybe this?

Bonus: The possible transformations could be explored by a least square fit of M(theat_i) to a set of predefined nonlinear models (log(theta), exp(theta), 1/theta, ...).

Discussion

Would you find that useful?

Any ideas on how to define the linearity?

ajafarihub commented 2 years ago

I am not sure if I could fully understand all challenges you mentioned. Only some thoughts of mine:

1) First a general thing that I am not 100% sure about, but as far as I learnt in a recent course at the Uni, a sampling procedure is valid as long as NO significant correlation among samples of unknown parameters exists. This requirement is, however, violated in the above example if I see it correctly. The samples of parameters E_beam and E_conn are quite highly correlated !

2) I think, the question of how good/bad a function is linearized around a specific point can be answered based on the second derivative of that function. Reason: the accuracy level of a linearization directly goes to that of a Taylor expansion which has been truncated from the second derivative onward, e.g. if the second derivative of a function is quite high, a linearization introduces quite too much inaccuracy. So, IMO, a logical approach is to find a good indicator for how large the second derivative is. For that purpose, we can compare the first derivative at "an appropriate set of different points in the neighborhood of the mean point". Now the question is, what would be "an appropriate set of different points in the neighborhood of the mean point" in a high-dimensional space of parameters ? ... hmm ... interesting to think about it.

joergfunger commented 2 years ago

You could try to compute the diagonal entries of the second derivates (so for a single parameter) at the MAP and try to estimate the error resulting from that onto the posterior (sum_i dTheta_post_mean/dH2 dH2) und analog dTheta_post_std/dH2 dH2.

TTitscher commented 2 years ago

90 Tries to provide a solution.

BAMresearch / bayem

Linearity check #67

Problem

Goal

Possible solutions

Discussion

90 Tries to provide a solution.