brinckmann / montepython_public

Public repository for the Monte Python Code
MIT License
93 stars 81 forks source link

Priors for nested sampling #28

Closed ivandebono closed 3 years ago

ivandebono commented 5 years ago

I'm trying to run MontePython with nested samping for the Planck 2015 likelihood (base2015TTTEEE), using the parameter file provided. Since nested sampling requires hard priors, I used the values in the 2015 Planck analysis (from the .ranges file in the publicly released Planck chains).

I'm getting a series of errors which prevent the chain from being output correctly. They all seem to be related to the values of the cosmological parameters passed on to CLASS.

In the default parameter file, the prior ranges of the parameters in question (omega_b seems to be the cause of the problem) are set to None. Why should setting hard parameters affect the initial and subsequent guesses? It shouldn't take the values into unphysical regions. If anything, it should prevent it. Am I missing something?

And is there a solution?

Here is a sample of the errors:

Error in Class: background_init(L:634) :condition (pba->shooting_failed == TRUE) is true; Shooting failed, try optimising input_get_guess(). Error message:

input_init(L:330) :error in input_find_root(&xzero, &fevals, &fzw, errmsg); =>input_find_root(L:4006) :error in input_fzerofun_1d(x1, pfzw, &f1, errmsg); =>input_fzerofun_1d(L:3543) :error in input_try_unknown_parameters(&input, 1, pfzw, output, error_message); =>input_try_unknown_parameters(L:3728) :error in thermodynamics_init(&pr,&ba,&th); =>thermodynamics_init(L:390) :error in thermodynamics_recombination(ppr,pba,pth,preco,pvecback); =>thermodynamics_recombination(L:2611) :error in thermodynamics_recombination_with_recfast(ppr,pba,pth,preco,pvecback); =>thermodynamics_recombination_with_recfast(L:3253) :error in generic_integrator(thermodynamics_derivs_with_recfast, zstart, zend, y, &tpaw, ppr->tol_thermo_integration, ppr->smallest_allowed_variation, &gi); =>generic_integrator(L:112) :error in rkqs(&x, h, eps, &hdid, &hnext, derivs, parameters_and_workspace_for_derivs, pgi); =>rkqs(L:156) :condition (xnew == *x) is true; stepsize underflow at x=nan

Error in Class: background_init(L:634) :condition (pba->shooting_failed == TRUE) is true; Shooting failed, try optimising input_get_guess(). Error message:

input_init(L:330) :error in input_find_root(&xzero, &fevals, &fzw, errmsg); =>input_find_root(L:4006) :error in input_fzerofun_1d(x1, pfzw, &f1, errmsg); =>input_fzerofun_1d(L:3543) :error in input_try_unknown_parameters(&input, 1, pfzw, output, error_message); =>input_try_unknown_parameters(L:3728) :error in thermodynamics_init(&pr,&ba,&th); =>thermodynamics_init(L:304) :error in thermodynamics_helium_from_bbn(ppr,pba,pth); =>thermodynamics_helium_from_bbn(L:1218) :condition (omega_b > omegab[num_omegab-1]) is true; You have asked for an unrealistic high value omega_b = 5.304738e-02. The corresponding value of the primordial helium fraction cannot be found in the interpolation table. If you really want this value, you should fix YHe to a given value rather than to BBN

brinckmann commented 5 years ago

Hi Ivan,

MultiNest will explore all of the parameter space that is within the boundaries, so you need to make sure the entire range is reasonable and that the parameter scaling is correct.

Best, Thejs

ivandebono commented 5 years ago

For the base model, I'm using the values below. Should I use a narrower prior range? data.parameters['omega_b'] = [2.2253, 0.5, 10, 0.028, 0.01, 'cosmo'] data.parameters['omega_cdm'] = [0.11919, 0.001, 0.99, 0.0027, 1, 'cosmo'] data.parameters['100*theta_s'] = [1.0418, 0.5, 10, 3e-4, 1, 'cosmo'] data.parameters['ln10^{10}A_s'] = [3.0753, 2, 4, 0.0029, 1, 'cosmo'] data.parameters['n_s'] = [0.96229, 0.8, 1.2, 0.0074, 1, 'cosmo'] data.parameters['tau_reio'] = [0.09463, 0.01, 0.8, 0.013, 1, 'cosmo']s

brinckmann commented 5 years ago

I would use something like 3 or 5 times the Planck 1-sigma values, otherwise it will take a long time to converge. If you're considering some extended models or combinations of datasets you may hit the parameter boundaries, but then you can update them and re-run.

Best, Thejs

ivandebono commented 5 years ago

I did, and it hasn't helped. Could you post an example of a parameter file which ran succesfully?

There are two further issues:

The NS folder cannot be analysed after an unfinished run, so unlike a run with the default Metropolis-Hastings method, you can neither analyse in mid-run, nor interrupt the run and analyse.

Analyze Error: /|\ You asked to analyze a NS folder which seems to come from an unfinished /o\ run, or to be empty or corrupt. Please make sure the run went smoothly enough.

And the run cannot be restarted in the same folder if the previous run was unfinished.

Starting MultiNest ERROR: no. of points in ev.dat file is not equal to the no. specified in resume file. Aborting

Is there a workaround?

brinckmann commented 5 years ago

Have you tried the example_ns.param that can be found in the input directory?

Yes, MultiNest runs can only be restarted from resume points. I imagine you should be able to remove points added after the last resume point if this is not done automatically. Perhaps @DeannaCH can provide more information on what works best when using MultiNest within MontePython, but you should have a look at the official multinest and pymultinest documentations, which I think should have all the information you seek.

Best, Thejs

ivandebono commented 5 years ago

I'm running example_ns.param right now. I'll report later, if I manage to complete a successful run.

With nested sampling, you cannot analyse the folder in mid-run (Can you? If so, how?) , so it's a bit of a blind guess.

Also, users runnnig on a cluster should be aware that running with mpirun the first time round will give you an error (below). It did for me, on the CC.in2p3.fr cluster.

Configuration Error: /|\ You are running in a folder that was created following a non-successful /o\ initialisation (wrong parameter name, wrong likelihood, etc...). If you have solved the issue, you should remove completely the output folder, and try again. Alternatively, there could be a problem with cosmo

The solution was to do a first run using only $python montepython/etc. and a minimal number of points (I used --NS_max_iter 2). This correctly creates the output folder and the NS subfolder with all its files. Then you can run as usual with $mpirun.

I hope this helps any of the users having similar problems.

Update: I got the dreaded Multinest segmentation fault (I'm running on a cluster). I'll try to rerun with an increased stack size (the solution recommended here: https://bitbucket.org/joezuntz/cosmosis/pull-requests/5/postprocessing-output-and-multinest/diff)

mpirun noticed that process rank 0 with PID 0 on node ccwpge0101 exited on signal 11 (Segmentation fault).

ivandebono commented 5 years ago

Have you tried the example_ns.param that can be found in the input directory?

Best, Thejs

I did. The run terminated correctly, with no errors. But when I tried to analyse the folder I got this:

Analyze Error: /|\ You asked to analyze a NS folder which seems to come from an unfinished /o\ run, or to be empty or corrupt. Please make sure the run went smoothly enough.

dchooper commented 5 years ago

Hi Ivan, To analyse the folder of the example_ns.param run, please make sure you are analysing "chains/example/NS" first and then "chains/example". There is a two step process involved in analysing a nested sampling folder, as explained here: https://monte-python.readthedocs.io/en/latest/nested.html This can be done at any point during the run, you do not have to wait for the run to finish.

For the base model, I'm using the values below. Should I use a narrower prior range? data.parameters['omega_b'] = [2.2253, 0.5, 10, 0.028, 0.01, 'cosmo'] data.parameters['omega_cdm'] = [0.11919, 0.001, 0.99, 0.0027, 1, 'cosmo'] data.parameters['100*theta_s'] = [1.0418, 0.5, 10, 3e-4, 1, 'cosmo'] data.parameters['ln10^{10}A_s'] = [3.0753, 2, 4, 0.0029, 1, 'cosmo'] data.parameters['n_s'] = [0.96229, 0.8, 1.2, 0.0074, 1, 'cosmo'] data.parameters['tau_reio'] = [0.09463, 0.01, 0.8, 0.013, 1, 'cosmo']

These ranges are all extremely high. This will often cause the sampler to go into regimes that are unphysical, leading to the CLASS error you mentioned above. This then has the knock-on effect that MultiNest can't find enough live points to properly restart, which is why you get "ERROR: no. of points in ev.dat file is not equal to the no. specified in resume file.". I would really recommend using 3 or 5 times the Planck 1-sigma values as your upper and lower limit, as @brinckmann proposed, or the ranges proposed in the example_ns.param.

The solution was to do a first run using only $python montepython/etc. and a minimal number of points (I used --NS_max_iter 2). This correctly creates the output folder and the NS subfolder with all its files. Then you can run as usual with $mpirun.

This should always be done, not just with MultiNest. Creating the log.param first (and any additional subfolders) is a safety check you should do before launching a full run, especially when using mpi. This can be seen in an example here: https://monte-python.readthedocs.io/en/latest/example.html

I hope this and the documentation: https://monte-python.readthedocs.io/en/latest/nested.html help you solve your issues. Let us know if the problems persist. Regards, Deanna

ivandebono commented 5 years ago

Hi Ivan, To analyse the folder of the example_ns.param run, please make sure you are analysing "chains/example/NS" first and then "chains/example".

So I did, and I still got the error. The parameters used are example_ns.param, out of the box without any changes, with 30,000 points.

On the complexities of MPI and computing clusters, I tried mpiexec instead of mpirun, and it seems to work better, at least on the CC.in2p3 cluster. I'll share any useful results.

ivandebono commented 5 years ago

After a lot of trial and error, here's what worked for me. I'm running MontePython on the cc-in2p3.fr cluster.