AstroJacobLi / popsed

Population-Level Inference for Galaxy Properties from Broadband Photometry with Neural Density Estimation
MIT License
0 stars 1 forks source link

Transforming parameter space #5

Open AstroJacobLi opened 2 years ago

AstroJacobLi commented 2 years ago

We have discussed for a while using the variable transformation trick to avoid using penalty functions for unphysical regions. I did a few tests in this direction.

Assume we have an RV $X$ and its support is $[a, b]$. If we believe that our prior on $X$ is a tophat, then we can transform $X$ to a new RV $Y = \Phi^{-1}((X-a)/(b-a))$ (where $\Phi$ is the CDF of normal distribution) which follows a standard normal distribution.

With this said, all SPS parameters in our case have supports in the form of $[a,b]$. And we want our prior to be a tophat. Then we can basically initialize Gaussians in the $Y$ space (we can only do this because normalizing flow starts with Gaussians), then transform it back and get a flat prior. The range [a, b] is also relatively easy to determine, basically setting the range to the emulator range is okay.

This figure shows the initialization scheme as described above. The initial distribution is flat (shown in noisy blue contours).

Before transformation (just standard normal distributions) image

After transformation (standard normals become tophats) image

AstroJacobLi commented 2 years ago

It seems this initialization is more like a flat "prior" in a Bayesian sense. Then I train the NDE without penalty (we don't need that anymore). The training looks okay. After combining 20 NDEs, this is what I get (shitty): image In one word, very poor constraints on all other parameters except for dust2, redshift, and stellar mass. If we believe this is true, then the good results I showed last week are largely due to the "implicit non-flat prior" which comes from the way we initialize NDEs.

pmelchior commented 2 years ago

Let me guess: when you sample from these posteriors, the photometry still matches well with observations?

AstroJacobLi commented 2 years ago

Let me guess: when you sample from these posteriors, the photometry still matches well with observations?

True, matches with observation quite well.

pmelchior commented 2 years ago

Is the loss any higher than with the previous flow initialization?

AstroJacobLi commented 2 years ago

I think I found a way to make the NDE training better. I tried to tune various hyper-parameters, and found the most important one is the blur in calculating Wasserstein loss. If I understand correctly, this parameter controls how much Wasserstein distance is sensitive to the fine structures of the data. A small blur (e.g., 1e-3) makes the loss function very sensitive to the detailed structures of the two distributions. In the past, I set blur=0.1, which seems not sufficient to provide the optimizer enough gradient. Now I train NDEs using blur=1e-3. The results are shown below. Compared with previous results, we have better constraints on stellar mass, redshift, metallicity, dust optical depth, and dust attenuation slope. Without a strong prior, the photometric data still lack constraining power on SFH. image

pmelchior commented 2 years ago

nice, this does look a lot better.

AstroJacobLi commented 2 years ago

Another good sign: with blur=1e-3, the results do not show a strong dependence on the architecture of neural nets.