Prescribed correlation (in noise model)

ajafarihub commented 2 years ago

I think, the VB can be improved by adding a known nonuniform covariance matrix to the noise model according to eq. (5) of this document.

The idea is to have a known covariance that can be scaled by the unknown \Phi . An example of application of this is when we model the noise in the force residuals of a structure, which should be somehow proportional to the stiffness matrix. So, the stiffness matrix can take part as a basis matrix for the covariance of the noise, which will be scaled by the identified \Phi.

Also at the same time, the terms cancelled out in eq. (34) can be dropped from the VB implementation, as well.

ajafarihub commented 2 years ago

I think I missed to assign this issue to you so that you can see it.

TTitscher commented 2 years ago

So you propose to change/generalize our "univariate" noise

e ~ MVN(0, phi I) (1)

to a non-identity covariance

e ~ MVN(0, phi C) (2)

right? I can imagine that we use some simplifications based on the identity matrix in (1) in our derivation and that the final formulas may change. Have you checked that? Maybe @aradermacher ? :confused:

ajafarihub commented 2 years ago

Yes, this is my idea. I followed the same procedure as what Annika did for the current setup of VB, and that is written in details in this document. I also add Annika to assigners so that she can also put her opinion. And I did not check it, I thought first to share it with you and see if this modification is worthy to do , or ... .

aradermacher commented 2 years ago

I just briefly looked at it. It seems possible. But you should take our newest derivation of vb with the constant term!. Adding the covariance will influence the constant terms as well. And we already saw that we need the constant terms to get the right evidence.

ajafarihub commented 2 years ago

Thanks @aradermacher for your comments. I'll start adding this option to the VB as soon as it becomes urgent for the case study of my particular problem (FEMU-F), and that is quite likely. The reason as I mentioned in above is that, I have two noise groups regarding model errors in:

(1) measured displacements
(2) force residuals and these two are connected to each other via the stiffness matrix of my FE model. That means, assuming a uniform noise model for (1), we would need to have a nonuniform noise model for (2): through C = Stiffness Matrix or maybe its inverse (I am not sure right now, which one).

Maybe a follow-up question is then: If we select a nonuniform noise model for (1) let say via the matrix C_u coming from a spatial correlation matrix, how should it affect the nonuniform noise model of (2) ? I guess something like: *C_f = C_u Stiffness_Matrix** , or ... ?

TTitscher commented 2 years ago

Imagine we implemented that C, how would a test case look like that would fail for C = I, but succeed for specified C=C_test != I? I assume some correlated test data coming from data = model(theta_true) + sigma_true*MVN(0, C_test). How to proceed then?

Our previous idea to deal with correlation was to find a transformation matrix (NxM) based on some eigenvalue composition of C (NxN) that reduces the (correlated) model error of length N into M uncorrelated entries -- a transformation in an uncorrelated space(?) If your idea works out, it could be a really convenient way to deal with correlated data within our algorithm.

ajafarihub commented 2 years ago

About the test case, your suggestion seems to me logical. Generally, we use synthetic data (for which we know the ground truth) and compare VB results with results of a sampling method. However, this modification of VB is - at the end of the day - only some changes in our noise model and what assumption we took for it, so IMO, the effect of this modification can change the results to some extend such that we might be yet unable to consider one case (e.g. without C) as totally correct and the other as totally False. We might get quite reasonable results from both models based on synthetic data generated from a non-uniform noise.

About your second paragraph, I am not sure if I get your idea fully. To my understanding, the reduction of correlation matrix is useful when our goal is to identify the correlation in terms of several weighting factors (contribution of each eigen-correlation). I think of it as a matter of number of DOFs (identifiable parameters) in our inference setup. The current VB has only ONE DOF (identifiable noise parameter), and what I understand from your suggested method is trying to have several DOFs, each representing the contribution of an eigen-value. If this is your target, my idea (C matrix) does not work, since it still has ONE DOF which contributes to all eigen-values of correlation matrix in the same way.

TTitscher commented 2 years ago

we might be yet unable to consider one case (e.g. without C) as totally correct and the other as totally False. We might get quite reasonable results from both models based on synthetic data generated from a non-uniform noise.

Hmm, but imagine you have to make sure you implemented the C thing correctly. How exactly would you do that?

ajafarihub commented 2 years ago

I would say, let's hope that your suggested test will make sense. For example with a 2*1 output signal in one noise group, we set:

data from a MVN with target_cov.=[1 5; 5 10] and try each of:
(1) current VB: C = I = [1 0; 0 1]
(2) nonuniform VB: C = [2 10; 10 20] . I would expect that \Phi1 will become something btw. 1 and 10 that is in neither case representing the target covariance matrix, while (2) should be very close to 0.5 , resulting in the target covariance .

joergfunger commented 2 years ago

I'm not fully sure, why having a correlation in the noise for the displacements (at the interior) results in a correlation of the forces (reaction forces at the boundary). That certainly depends on how you interpret the noise, but if it is really noise (and not model bias), then those could be totally uncorrelated.

ajafarihub commented 2 years ago

In my setup of FEMU-F, we have another noise group for the residuals at free DOFs (not at reaction DOFs), and my idea is about this noise group, where we have a direct effect from noisy measured displacements. Thus, I would expect that the noise model of residuals at free DOFs is somehow influenced by that of measured displacemnets, meaning that if we have correlation in the latter we observe similar pattern/correlation in the former.

This is also somewhat related to another doubt that I got several months before: IMO, in the VB, we cannot identify both noise models of residuals and displacements , since the VB method is based on the assumption that the noises must be independent, which is - I think - not the case for residuals and displacements. So, this was my main motivation to think about fixing/prescribing one of the noises (in my case, residuals) and identifying the only other (displacements). And this idea is already implemented. Another possible improvement is still to consider a link between an identifiable correlation in one group and a fixed correlation in another one; in my setup: the identifiable correlation in displacements influences the prescribed correlation in residuals (of free DOFs).

Apart from this aspect (of interaction between two noise models), still the inclusion of a basis correlation matrix (e.g. a spatial one for displcements) into the noise model and trying to identify the scale of that basis matrix (via \Phi) seems to me already an improvement. Of course, verifying it with minimal example(s) is required, though.

joergfunger commented 2 years ago

But you identify the real displacements, thus the noise in the displacements does not change the residual of the reaction forces on the boundary - as long as the identified displacements are actually the correct ones?

TTitscher commented 2 years ago

All valid points and necessary discussion, but I think we can separate the two ongoing discussions. I like to continue the technical, bayem-related discussion of having e~N(0,phi C) here and ask you to discuss the specific application to FEMU-F internally. Thanks!

Or, @joergfunger do you imply that the C-feature may not solve the problem of @ajafarihub and we should not implement it?

ajafarihub commented 2 years ago

I think , Thomas' remark is very relevant. I also have the impression that we deal with two rather separate things: 1) including C matrix into the VB setup. 2) fitting this feature to any application like FEMU-F; e.g. justifying whether this feature can be further extended to connect two noise groups in terms of their correlation matrices.

So, I would say, maybe we devote this issue #56 to (1) and shift (2) in a right time to another issue. :)

TTitscher commented 2 years ago

Nice, and maybe (2) can be discussed outside of this repository.

@ajafarihub could you open a branch for that and, for now, only update the derivation for us to double-check? After that, we can start implementing it. My first idea would be a bayes.vb.Gamma.C = ...

Edit: Maybe you can introduce another color for your changes.

joergfunger commented 2 years ago

I justed wanted to say that this C matrix with two separate C matrices for individual sensor groups makes sense (so in that setting you would then need two independent C matrices that are scaled by individual noise terms). And this is important for the derivation and implementation.

TTitscher commented 2 years ago

Brief note: If we run into performance problems (I assume within J.T x C x J), we can have a look at the tripy package of Ioannis, e.g. the methods here to utilize special structures of C.

ajafarihub commented 2 years ago

I provided the document in a branch here. It however probably needs to be pulled and compiled via doit command on your PC , so that an html file is generated. It seems that this link is based on master branch only.

joergfunger commented 2 years ago

I think this is not fully correct, the determinant of C seems to me not being properly propagated (at least I would expect in this N/2 log Phi term something that relates to det C-1.

ajafarihub commented 2 years ago

If I got the point correctly , you refer to the log-likelihood (eq. 12). Good point! I think, I did not consider the log of |C| because it is - at the end - a constant term. So, it is treated the same way as the term (2*pi)^N is treated in the likelihood formula. So, it should appear in the constant term (green colored formula after eq. 17). And as far as I understood, this matters only when we wish to compute evidence with the help of the calculated free energy. And this aspect I have not yet taken cared of in my derivations.

joergfunger commented 2 years ago

But we wanted to have all the terms to compute the model evidence.

ajafarihub commented 2 years ago

Ok, I'll adjust the free energy to suite for the computation of evidence. Up to now, I think the update equations should be correct, though.

aradermacher commented 2 years ago

I looked at the equations. Nice work. I agree with Jörg, that the constant terms are not correct right now. In eq 12: const should be -n/2log(2pi)-n/2log(Ce) and the second terms is then missing in the green equation under (17). For the derivation of the update equation this should not be a problem, because the constant terms are not considered here BUT for the free energy equation. Furthermore, I'm not sure if the naming in eq (7) is fully correct. This is the likelihood, right? The superscript N of phi is in the general MVN definition not present (see also your eq 13) but comes from the likelihood definition. And I liked that you add the prior definitions, we should add that in the docu of the master branch too!

ajafarihub commented 2 years ago

IMO, eq. (7) is a certain choice for the noise model (so, just a definition/assumption). But it also introduces - at the same time - the likelihood (via eq. 11). That briefly means: Likelihood =: P(y|w) = P(e|Phi) I will update the formula of the free energy as soon as possible and give you here a notice for pulling agian.

ajafarihub commented 2 years ago

I updated the branch. I checked the revised formulation (with C) with the one that Annika did before (without C and on master branch). Everything in the former case is simplified to the latter case after putting C=identity matrix .

Edit ttitscher: remove non-bayem related discussion, let's discuss internally

ajafarihub commented 2 years ago

I deviated a new branch from the most updated master branch and there I pushed the minimal example that we discussed about on Friday. Further developments (e.g. implementation of C matrix in VB-library) will be proceeded in that branch.

ajafarihub commented 2 years ago

Here is a script where we compare the VB method (with correlation matrix) with a minimal analytical example. The example is to identify the mean of some numbers, knowing their covariance matrix as prescribed. In the script, we get:

an exact match btw. the VB posterior and the analytical posterior,
a very good match btw. the converged free energy of VB and the log-evidence numerically computed from its definition.

Question: I can now be confident that the formulation with C matrix is correct and tested quite well, nevertheless, I did the derivation for only one noise group in the VB. Although the extended formulas in the case of several noise groups could be easily predicted, it seems that a similar proof/derivation for the case of several noise groups is also required. How would you think about it?

TTitscher commented 2 years ago

Just to inform everyone about the state of this issue and subsequent internal discussions:

The proof of concept is done in a dummy implementation with only one noise group. As we need to invert C, we currently restrict ourselves to C coming from a exponential kernel where a closed form inversion is available.
In fact, the implementation does not need C, but only its inverse, so the API will likely, as for the MVN for the parameters, based on the precision matrix.
bayem.correlation will provide the precision for the exponential kernel in case of 1D correlations, e.g. correlations in time or within one spatial dimension. Extending this (in an efficient way) will likely require the matrix manipulation/inversion methods of tripy
A planned extension involves separating the noise as e ~ MVN(0, phiC + D) where D could be restricted to a diagonal matrix representing known sensor noise. Or it could also be a correlation matrix...

BAMresearch / bayem

Prescribed correlation (in noise model) #56