This issue is to track the development of a VI procedure, which is composed of two steps

Find the MAP
Fit a surrogate posterior

In order to validate the procedure, I will use the same model to generate the simulations and for inference. For the results, I will compare to both the expected values used to generate the sims and the posterior contours from an HMC procedure.

1. MAP

25 Gaussian light profile galaxies, constant shear.

Params:

For each galaxy: hlr, e1, e2
For the field: gamma1, gamma2

I aimed to get the MAP running gradient descent on the negative log posterior - log p (hlr, e, gamma, obs) proportional to the negative log posterior.

This objective being non Gaussian, it seems to be non trivial to obtain the global minimum, even with Adam. So my first strategy was to run 100 chains in parallel for 100 iterations of Adam(lr=0.1) and then 100 iterations of Adam(lr=0.01).

and

hlr ground simulations

hlr MAP

Augmenting the number of iterations didn't change the occurence of getting the global optimum...

Overall, the hlr paramters are always well fitted. However the shear and ellipticties being degenerated, the convergence to the expected value is hard to obtain (see the two examples at the end of this post). I would say that maybe this is not a big issue since we can continue to fit the mean during the VI part. Any other idea on this @EiffL?

2. Variational Inference

Once the MAP was obtained. I tried to fit the mean and covariance matrix of a multivarite normal distribution to the posterior (using tfd.MultivariateNormalTriL).

Optimization was done maximizing the ELBO, i.e. E[log p(z,x)]- E[log q(z)], where q is my surrogate posterior. Here is how I wrote the ELBO in practice:

batch_size = 64

def ELBO(batch_size, joint_dist, loc, scale):
  # sample q(z)
  samples = surrogate_posterior(loc, scale).sample(batch_size)
  # samples = [hlr, e, gamma]

  # evaluate p(x, z) = p(hlr, e, gamma, obs)
  p_x_z = joint_log_prob(samples)

  # evaluate q(z, lambda)
  q_z = surrogate_posterior(loc, scale).log_prob(samples)

  return tf.reduce_mean(p_x_z - q_z)

Initialization: loc = z_MAP, scale = 0.1 * I_d (scale.T @ scale = cov so equivalent to the standard deviation).

Optimized with Adam(lr=1e-3) for 200 iterations I get:

But what I find weird is the size of the shear contours around the MAP. The error bars are one order of magnitude to high.

However the scale matrix for the shear, seems one order of magnitude too low

[0.002638  , 0.        ],
[0.00020927, 0.0023413 ]

The mean posterior for the shear being [0.06719618 0.01234444]

Here is another run of the computation of the MAP and VI and a bigger error for the MAP.

So I don't get why I obtain such wide contours with this corresponding scale matrix? And why the scale matrix seems to be under estimated.

Is the multivariate normal distribution inappropriate? The HMC contours seemed to be adapted to a Gaussian approximation.

Or do I make a mistake in the VI optimization?

Here is a full notebook for more details.

b-remy / gems

MAP and VI #9

1. MAP

2. Variational Inference