eelregit / pmwd

Differentiable Cosmological Forward Model
BSD 3-Clause "New" or "Revised" License
69 stars 16 forks source link

A_s vs sigma_8 parameterization #8

Closed modichirag closed 1 year ago

modichirag commented 1 year ago

This is tangentially related to Issue 6

We decided to parameterize cosmology in terms of $A_s$ instead of $\sigma_8$ since we were of the opinion that $A_s$ is more independent of other cosmology parameters than $\sigma_8$, and hence sampling the former might be easier during inference. However in my experiments, I am finding the case to be different.

I set up a toy problem of sampling $A_s$, $\Omega_m$ and the initial conditions for a mock dark matter density field data. L=100, N=32, 3step PM forward model, noise=shot noise

Here are the posteriors in $A_s$- $\Omega_m$ plane, and corresponding generated posterior in $\sigma_8$ - $\Omega_m$ plane where I estimated $\sigma_8$ for given posterior samples. This is 8000 samples without thinning. image

The posterior seems to be much harder in $A_s$- $\Omega_m$ parameterization. In fact most of my chains are not even burning.

Covariance matrix for $A_s$- $\Omega_m$ is

[[ 0.00113 -0.01357]
 [-0.01357 0.17.1 ]]

with condition number ~2600 and covariance matrix for $\sigma_8$- $\Omega_m$ is

[[ 0.00113 -0.00046]
 [-0.00046  0.00032]]

with condition number ~12.

Now one can maybe scale down $A_s$ by a factor of 10 to make it same order as $\Omega_m$. That might help a little. But even then the posterior shows that there is a multiplicative degeneracy (banana shape), which makes sense if you think about it and also look at the code-

    Plin= (
        0.32 * cosmo.A_s * cosmo.k_pivot * _safe_power(k / cosmo.k_pivot, cosmo.n_s)
        * (jnp.pi * (conf.c / conf.H_0)**2 / cosmo.Omega_m * T)**2
        * D**2

i.e. $P_{lin} \propto A_s / \Omega_m$ It will not completely go away with $\sigma_8$ parameterization, but the posterior says it might be better constrained.

Finally, though this was for dark matter example, I suspect it will hold for all LSS (galaxies), weak lensing will have a different form but I don't know yet which parameterization is better there.

But for sampling with galaxy clustering, I am now suspecting $\sigma_8$ parameterization might be better. Is there any way we can keep both and let the user decide? Thoughts, opinions, comments? Does this make sense?

eelregit commented 1 year ago

Do you know why the cond num is 12 for sigma8? The plot looks better than that.

Not a fan of sigma8 myself, the main reason I picked As is that it's physical. And in the physical coordinates it's easier to interpret the meaning of the each gradient component.

I wonder if there's a simpler and quicker way, following your degeneracy reasoning, to fit the banana and implement that reparametrization. I wonder if even $\sqrt{A_s} / \Omega_m - \Omega_m$ would work. Could you run a chain for that too?

On the other hand, shouldn't VBS do this automatically?

eelregit commented 1 year ago

If we also consider how $S_8$ is parameterized, I think $\sqrt{A_s} \Omega_m^\alpha$ is probably a more general way to parameterize the amplitude for joint probe analyses.

However, we can provide a s8_to_As function for real fans of $\sigma_8$ :grinning:

modichirag commented 1 year ago

I am not sure why the condition number is 12 for sigma8 plot, maybe because the chains haven't really converged as you can see from the jagged/striped nature.

Regarding reparameterization - that is a rabbit hole I am not sure either of us want to go very deep in :) Though I agree some reparameterization will make the posterior better conditioned, but naively figuring that out if we do not have any model for it is a bad combinatorial problem. We can mayberun the chain for A_s-\Omega_m and then just converting that to different reparametrizations and see which works the best, but we will still be doing a guessing game. (Lol, maybe symbolic regression here will help :P)

However all of this will depend on the data (this was for dark matter, it will be different for galaxies, lensing, combinations etc) and also depends on signal-to-noise in the data

VBS can maybe handle this, but we still need to burn-in first and get some sense of posterior to train initial NF. VBS can help jump around if we have some sense of posterior. I haven't explored it for parameters yet, I was planning on it and so deiced to first run HMC to get correct numbers and got stuck here^

eelregit commented 1 year ago

See #6.