Running CosmoPower_jax with COBAYA for LCDM and Planck data

ark93-cosmo commented 2 weeks ago

Is there an example yaml file or python script, which is publicly available, to do an MCMC in COBAYA using CosmoPower for LCDM and the Planck likelihood? Also, if there are any other scripts that could be needed to make this run work, sharing them would be really appreciated.

dpiras commented 2 weeks ago

Hey @ark93-cosmo, thank you for your interest in CosmoPower-JAX!

Just to clarify, would you need to run CosmoPower-JAX (namely you need the JAX version), or just CosmoPower (which runs on TensorFlow)? Also, are you planning to need any GPUs?

I guess you'd like to take the cobaya implementation, and replace CLASS or CAMB with a faster emulator?

ark93-cosmo commented 2 weeks ago

Thank you for your reply! I need the JAX version. For the moment I want to use CPUs, but I might use GPUs later. Exactly, I want to replace the theory part of the cobaya dictionary, which is usually dedicated to either CLASS or CAMB, with a faster emulator.

dpiras commented 2 weeks ago

Right, and I guess you already have a trained model to load into CosmoPower-JAX for the CMB spectra in the Planck likelihood?

ark93-cosmo commented 2 weeks ago

I think so. I took the cmb_TT, TE,EE pickle files from CosmoPower trained models and converted them into npz files. I also have the cmb_lcdm_spt pickle files, which I can convert to npz files

dpiras commented 2 weeks ago

Cool, thanks for confirming.

This seems more of a cobaya issue since it will depend on the implementation you are using, but indeed I have integrated CosmoPower-JAX in cobaya at some point in the past. I would first recommend reading here for some possible help (https://github.com/CobayaSampler/cobaya/issues/254), and then maybe while I search for the code I used it could be helpful if you let me know which cobaya script you are using?

ark93-cosmo commented 2 weeks ago

Thanks a lot! What do you mean by cobaya script? As in a yaml file or a python script?

dpiras commented 2 weeks ago

I'd say both, but in general you need to find in which part of the likelihood the call to CLASS or CAMB is made. The likelihood is probably shipped with cobaya in your case -- which likelihood are you using, and how are you calling it?

ark93-cosmo commented 2 weeks ago

I have a python script that looks something like this:

from cobaya.run import run

import cobaya

info={ "params": { "omegab": { "prior": { "max": 0.1, "min": 0.005}, "proposal": 0.0001, "ref": {"loc": 0.0224, "scale": 0.0001, "dist": "norm"}, "latex": "\Omega\mathrm{b} h^2" }, "omegacdm": { "prior": {"max": 0.99, "min": 0.001}, "proposal": 0.0005, "ref": {"loc": 0.12, "scale": 0.001, "dist": "norm"}, "latex": "\Omega\mathrm{c} h^2" }, "theta_s1e2": { "prior": {"max": 10,"min": 0.5}, "proposal": 0.0002, "drop": True, "ref": {"loc": 1.0416, "scale": 0.0004, "dist": "norm"}, "latex": "100\theta\mathrm{s}" }, "100*theta_s": { "derived": False, "value": "lambda theta_s_1e2: theta_s_1e2" }, "taureio": { "prior": {"min": 0.01, "max": 0.8}, "proposal": 0.003, "ref": {"loc": 0.055, "scale": 0.006, "dist": "norm"}, "latex": "\tau\mathrm{reio}" }, "tau": {

"derived": False,
     "value": "lambda tau_reio: tau_reio",
     "latex": "\\tau_\\mathrm{reio}"
},
"n_s": {
  "prior": {"max": 1.2, "min": 0.8}, 
  "proposal": 0.002, 
  "ref": {"loc": 0.965, "scale": 0.004,"dist": "norm"}, 
  "latex": "n_\\mathrm{s}"
}, 
"logA": {
  "prior": {"max": 3.91, "min": 1.61}, 
  "proposal": 0.001, 
  "drop": True, 
  "ref": {"loc": 3.05, "scale": 0.001, "dist": "norm"}, 
  "latex": "\\log(10^{10} A_\\mathrm{s})"
}, 
"A_s": {
  "latex": "A_\\mathrm{s}", 
  "value": "lambda logA: 1e-10*np.exp(logA)"
},
"clamp": {
  "latex": "10^9 A_\\mathrm{s} e^{-2\\tau}", 
  "derived": "lambda A_s, tau_reio: 1e9*A_s*np.exp(-2*tau_reio)"
}, 
"A": {
  "latex": "10^9 A_\\mathrm{s}", 
  "derived": "lambda A_s: 1e9*A_s"
}, 
"Omega_m": {
  "latex": "\\Omega_\\mathrm{m}"
}, 
"Omega_Lambda": {
  "latex": "\\Omega_\\Lambda"
}, 
"age": {
  "latex": "{\\rm{Age}}/\\mathrm{Gyr}"
}, 
"s8h5": {
  "latex": "\\sigma_8/h^{0.5}", 
  "derived": "lambda sigma8, H0: sigma8*(H0*1e-2)**(-0.5)"
},
"sigma8": {
  "latex": "\\sigma_8"
}, 
"z_reio": {
  "latex": "z_\\mathrm{re}"
}, 
"rs_drag": {
  "latex": "r_\\mathrm{drag}"
},  
"s8omegamp25": {
  "latex": "\\sigma_8 \\Omega_\\mathrm{m}^{0.25}", 
  "derived": "lambda sigma8, Omega_m: sigma8*Omega_m**0.25"
}, 
"H0": {
  "latex": "H_0"
}, 
"YHe": {
  "latex": "Y_\\mathrm{P}"
}, 
"s8omegamp5": {
  "latex": "\\sigma_8 \\Omega_\\mathrm{m}^{0.5}", 
  "derived": "lambda sigma8, Omega_m: sigma8*Omega_m**0.5"
}, 
"omegamh2": {
  "latex": "\\Omega_\\mathrm{m} h^2", 
  "derived": "lambda Omega_m, H0: Omega_m*(H0/100)**2"
}, 
}, "theory": { "classy": { "extra_args": { 'm_ncdm': 0.02, "N_ncdm": 1, "N_ur": 0.00441, "deg_ncdm": 3, "non_linear": "none", }, "ignore_obsolete": True } }, "sampler": { "mcmc": { "Rminus1_cl_stop": 0.2, 'learn_proposal': True, 'learn_proposal_Rminus1_max': 100, "drag": True, "Rminus1_stop": 0.02, "covmat": 'LCDM.covmat', "measure_speeds": True, "oversample_power": 0.4, "output_every": 1, "proposal_scale": 1, "max_tries": 1.e4 } }, 'force': False, "resume": True, "debug": False,

'verbosity': 2,

"timing": True, "output": data_dir+"/CLASS", "likelihood": { "planck_2018_lowl.TT": {}, "planck_2018_lowl.EE": {}, "planck_2018_highl_plik.TTTEEE": {}, } } updated_info, sampler = run(info) In the end I want to use both lensing reconstruction and BAO data, but I guess once I learn how to do it for the primary CMB, I can apply the same thing for PP and mPk files, right?

dpiras commented 2 weeks ago

Thank you for the file @ark93-cosmo. I think that you do need to find where e.g. planck_2018_lowl.TT is called. @alessiospuriomancini should be more helpful here

dpiras commented 2 weeks ago

I'm also flagging this repository, in case you missed it from my link above. Note that the code in that link does not use CosmoPower-JAX.

alessiospuriomancini commented 2 weeks ago

Hi @ark93-cosmo, thanks for your message and interest in CosmoPower!

My understanding from reading your message is that you do not need CosmoPower-JAX for your specific application within Cobaya. Cobaya cannot run on GPUs, and does not allow for gradient-based sampling, so you don't need to use either TensorFlow or JAX.

For you, it is enough to use the Numpy-based implementation of CosmoPower, which you can access using the functions ending in the suffix _np, e.g.

https://github.com/alessiospuriomancini/cosmopower/blob/cb323858f5567b61d3de71aadb452c95ef908984/cosmopower/cosmopower_NN.py#L409

The logic behind this is the following:

for training, you should always use the TensorFlow implementation of CosmoPower (just follow the tutorials provided in the repository);
for implementation within an inference pipeline, whether to use TensorFlow/JAX/Numpy really depends on your needs, and specifically on what your likelihood is written in. Since you are using Cobaya, you are using standard Python libraries like Numpy that do not run on GPUs, nor do they provide autodiff capabilities. Hence, you are better off "loading" your pre-trained weights and then accessing their predictions through a series of Numpy-based operations -- this is when you should use the functions ending in _np;
if on the other hand, you are running your inference pipeline using likelihoods written e.g. in TensorFlow (see for example here) or JAX (see e.g. these papers: 2305.06347, 2401.13433, 2405.12965, 2410.10603) because you want to run on GPUs and/or use gradient-based sampling methods, you have to use either the TensorFlow functions of CosmoPower (ending in _tf) or CosmoPower-JAX. So, concretely for your application, if instead of running with Cobaya you want to run candl, then you have to use CosmoPower-JAX.

I hope this clarifies, please let me know if there is anything else I can help with to clarify this point.

Now, coming to your question on how to call CosmoPower from Cobaya: as Davide mentioned above, this has been automatised in 2405.07903 and will soon be pushed here. In the meantime, as Davide mentioned you need to access the likelihoods you use in Cobaya, and change the part where they source the power spectra from Boltzmann codes, replacing it with CosmoPower. So, for the planck 2018 TT example, this line needs to be changed so that the C_ells come from CosmoPower and not CAMB or CLASS. A neat way of doing this is indeed as done here -- the idea is that you create a CosmoPower instance of the Theory module sourced by Cobaya.

This should give you a good start for how to implement this in practice. Of course we are both happy to help you out with the details if you need further help. I will close this issue for now, but please keep us updated on how things are going and we will be more than happy to support you further!

ark93-cosmo commented 2 weeks ago

Thank you so much @alessiospuriomancini and @dpiras for the immense help! I will follow you suggestions and keep you posted with any developments

dpiras commented 2 weeks ago

Great, let us know if you need further help @ark93-cosmo!

dpiras / cosmopower-jax

Running CosmoPower_jax with COBAYA for LCDM and Planck data #10

"derived": False,

'verbosity': 2,