CosmoStat / jax-lensing

A JAX package for weak gravitational lensing
MIT License
27 stars 3 forks source link

Learn a prior from simulations #3

Closed EiffL closed 3 years ago

EiffL commented 4 years ago

A third goal will be to replace the simple ad-hoc priors we would have used so far, by a prior we would learn from simulations.

For the training data, we can use either the massive-nu simulations from Jia: https://arxiv.org/abs/1711.10524 and available here: http://www.columbialensing.org/

orrr we also have simulations from @NiallJeffrey that he used for the Deep Mass paper.

Then the question is how do we learn the prior, and that's where things will start to get fun :-) We can use a pixelcnn or other generative model, as we did in https://arxiv.org/abs/1912.03980 Or..... we can try a different approach ;-)

EiffL commented 4 years ago

So here are some papers we should talk about when moving on to learn the data prior distribution:

Not the particular paper I had in mind ^^' but can't find it. I think the idea there is the same though.

EiffL commented 4 years ago

And here is the particular paper I had in mind: https://arxiv.org/abs/1708.08487 This little paper showcases that you can use a denoising autoencoder to learn the gradients of a data distribution.

Also interesting papers kind of related: Noise contrastive learning, http://proceedings.mlr.press/v9/gutmann10a/gutmann10a.pdf

EiffL commented 4 years ago

Alright, I gave it a try on a toy dataset: https://colab.research.google.com/drive/1hHji8BK-IQmKwTtFPlthAkvVM3pyPCCZ?usp=sharing

The cool thing is that it kinds of work, but clearly the denoising auto-encoder I'm using isn't correctly regularized, so the learned gradients are super weird in some places. I suspect it will require some additional love and care ^^' Here is what the learned gradients look like for a denoising autoencoder: image and here is what they should look like: image

This makes me think of this other paper: https://arxiv.org/pdf/1905.10710.pdf Which argues that autoencoders are bad for that kind of work. And instead they show that a Lipschitz regularised disctriminator is provably more robust. But then we end up colose to the likelihood ratio approach. Also to keep in mind if going in that direction is Joeri's paper: https://arxiv.org/pdf/1903.04057.pdf