Closed TTitscher closed 2 years ago
If you take a transformation with Q sqrt(eigenvalues), your likelihood becomes (Q sqrt(e) X)^T Cov^(-1) * (Q sqrt(e) X) = X^T X (also zero mean, unit std and independent)
So the scaling with Q makes the variables independent (not required if they are already independent) and the scaling with the srt of the eigenvalues makes them unit std.
To deal with numerically large parameters, the current VB implementation scales the provided prior by its mean and infers this scaled parameters. The main benefit is that the entries of the Jacobians and the precision are far closer to one.
This is not done for numerically small parameters, as there has to be some kind of
eps
to avoid division by zero (for zero mean parameters). Most of my problems are then solved foreps=1.e-20
, but the proper way would be to use some kind of scaling of the precision. For normal distributions, there is this zero-mean unit-variance transformation. How would that work for MVN?I read something here where an eigenvalue decomposition of the precision/covariance is used,
COV = Q . EVs . Q.T
, where scaling with Q is performed. However, in our case, COV is diagonal so Q is Identity and there will be no scaling...?