b-remy / gems

GEnerative Morphology for Shear
MIT License
0 stars 0 forks source link

MAP and VI #9

Open b-remy opened 2 years ago

b-remy commented 2 years ago

This issue is to track the development of a VI procedure, which is composed of two steps

  1. Find the MAP
  2. Fit a surrogate posterior

In order to validate the procedure, I will use the same model to generate the simulations and for inference. For the results, I will compare to both the expected values used to generate the sims and the posterior contours from an HMC procedure.

1. MAP

25 Gaussian light profile galaxies, constant shear.

Params:

I aimed to get the MAP running gradient descent on the negative log posterior - log p (hlr, e, gamma, obs) proportional to the negative log posterior.

This objective being non Gaussian, it seems to be non trivial to obtain the global minimum, even with Adam. So my first strategy was to run 100 chains in parallel for 100 iterations of Adam(lr=0.1) and then 100 iterations of Adam(lr=0.01).

Capture d’écran 2022-03-30 à 23 47 57

and

image

hlr ground simulations

image

hlr MAP

image image

Augmenting the number of iterations didn't change the occurence of getting the global optimum...

Overall, the hlr paramters are always well fitted. However the shear and ellipticties being degenerated, the convergence to the expected value is hard to obtain (see the two examples at the end of this post). I would say that maybe this is not a big issue since we can continue to fit the mean during the VI part. Any other idea on this @EiffL?

2. Variational Inference

Once the MAP was obtained. I tried to fit the mean and covariance matrix of a multivarite normal distribution to the posterior (using tfd.MultivariateNormalTriL).

Optimization was done maximizing the ELBO, i.e. E[log p(z,x)]- E[log q(z)], where q is my surrogate posterior. Here is how I wrote the ELBO in practice:

batch_size = 64

def ELBO(batch_size, joint_dist, loc, scale):
  # sample q(z)
  samples = surrogate_posterior(loc, scale).sample(batch_size)
  # samples = [hlr, e, gamma]

  # evaluate p(x, z) = p(hlr, e, gamma, obs)
  p_x_z = joint_log_prob(samples)

  # evaluate q(z, lambda)
  q_z = surrogate_posterior(loc, scale).log_prob(samples)

  return tf.reduce_mean(p_x_z - q_z)

Initialization: loc = z_MAP, scale = 0.1 * I_d (scale.T @ scale = cov so equivalent to the standard deviation).

Optimized with Adam(lr=1e-3) for 200 iterations I get:

image

But what I find weird is the size of the shear contours around the MAP. The error bars are one order of magnitude to high.

However the scale matrix for the shear, seems one order of magnitude too low

[0.002638  , 0.        ],
[0.00020927, 0.0023413 ]

The mean posterior for the shear being [0.06719618 0.01234444]

image

Here is another run of the computation of the MAP and VI and a bigger error for the MAP.

image

So I don't get why I obtain such wide contours with this corresponding scale matrix? And why the scale matrix seems to be under estimated.

Is the multivariate normal distribution inappropriate? The HMC contours seemed to be adapted to a Gaussian approximation.

Or do I make a mistake in the VI optimization?

Here is a full notebook for more details.

b-remy commented 2 years ago

Good news: I reproduced the experiment of Toy model 1 from Schneider et al. (2015) and Both the MAP and Variational Inference are working :-) The only difference is that I have a fixed variance for the ellipiticity prior, while they have a hierarchical prior. I will investigate this later.

Using a MultivariateNormalTriL for 100 gaussian galaxies, constant shear = [0.05 ,-0.05] and sigma_pix = 0.003 here are some contours:

image

Note : log_10(hlr) is represented here.

Shear marginals:

image

To further validate this result, I would like to compare to HMC contours.