baudren / montepython_public

Public repository for the Monte Python Code
MIT License
65 stars 115 forks source link

--update produces inconsistent covariance matrix #110

Closed noller closed 6 years ago

noller commented 6 years ago

When using the --update flag to automatise covmat updates, I've encountered a problem. It appears to be possible to "break" the run by updating after too small a number of steps. To have an explicit example, I am e.g. running the example.param that comes with MontePython as follows

mpirun -np 4 python montepython/MontePython.py run -N 1000000 -f 0.2 -p example.param -o chains/example --update 1000

where 1000 is an unreasonably small number. When I run this --update will actually (and very correctly!) refuse to use the newly computed covariance matrix the first few times it tries to run (I assume there are internal checks that make it realise the covmat is no good at this point), but eventually after around 10000 steps or so it does create a covmat, uses it and crashes. The error message gives a traceback ending in

File ".../montepython_public-2.2/montepython/likelihood_class.py", line 1422, in compute_lkl
    (det_mix/det_the + math.log(det_the/det_obs) - num_modes)
ValueError: math domain error

Any attempt to start a new analysis for any number of steps using the newly derived covmat now also fails (in my specific example, it does so saying that there is a negative inconsistent input for the primordial amplitude) and, from looking at the covmat generated, this appears to be because of negative eigenvalues in the computed covmat, so this seems to be related to issue #38.

Perhaps the idea is for each user to play with the update flag and choose a reasonably large value for the number of steps between updates (to ensure chains are already reasonably converged and the resulting covmat is consistent), but given that there already seem to be some checks in place to allow --update to decide itself whether to use the newly derived covariance matrix or not, perhaps this is something that is worth automatising too?

brinckmann commented 6 years ago

Change lines 364-368 of mcmc.py from

                        # Read the covmat
                        sigma_eig, U, C = sampler.get_covariance_matrix(
                            cosmo, data, command_line)
                        if command_line.jumping == 'fast':
                            Cholesky = la.cholesky(C).T

to

                        # Read the covmat
                        C_backup = C[:,:]
                        sigma_eig, U, C = sampler.get_covariance_matrix(
                            cosmo, data, command_line)
                        try:
                            Cholesky = la.cholesky(C).T
                        except:
                            C = C_backup

and lines 405-408 of mcmc.py from

                        sigma_eig, U, C = sampler.get_covariance_matrix(
                            cosmo, data, command_line)
                        if command_line.jumping == 'fast':
                            Cholesky = la.cholesky(C).T

to

                        C_backup = C[:,:]
                        sigma_eig, U, C = sampler.get_covariance_matrix(
                            cosmo, data, command_line)
                        try:
                            Cholesky = la.cholesky(C).T
                        except:
                            C = C_backup

This should restore the previous covmat in cases where the computed matrix is not positive definite. Let me know if it works. This won't fix starting from a covmat that is not positive definite, but it should work for --update. Best, Thejs

noller commented 6 years ago

Great, thanks a lot Thejs! I'll try this and will let you know how it went. Many thanks and best wishes, Johannes

noller commented 6 years ago

Hi Thejs,

so I modified mcmc.py in the way you suggested and it does indeed help! The covmat --update eventually produces in my trial run now only has positive eigenvalues (I just double-checked with Mathematica). However, unfortunately MontePython still crashes once using the updated covmat. The traceback error message still ends in

File ".../montepython_public-2.2/montepython/likelihood_class.py", line 1422, in compute_lkl
    (det_mix/det_the + math.log(det_the/det_obs) - num_modes)
ValueError: math domain error

and this comes accompanied with the following CLASS error

Error in Class: primordial_init(L:271) :error in primordial_analytic_spectrum_init(ppt, ppm);
=>primordial_analytic_spectrum_init(L:782) :condition (one_amplitude <= 0.) is true; inconsistent input for primordial amplitude: -0.00324215 for index_md=0, index_ic=0

So it looks like the new covariance matrix, despite appearing consistent, somehow leads to sampling over inconsistent input parameters for CLASS.

Two further things are strange: Firstly, the variance of the new covariance matrix for A_s is smaller than the one specified in example.param, so it doesn't look as though an artificially large variance accidentally allows flipping the sign of the amplitude (which was my initial guess). Running with this covariance matrix now always produces an analogous error (for different, but always negative values for the primordial amplitude), so the new covmat really seems to force the code to run with negative A_s.

Any idea what could be causing this?

Best wishes, Johannes

brinckmann commented 6 years ago

I suspect this is not related to the covmat at all. Did you try running without update (i.e. --update 0) and either without an input covmat, or with a base Planck covmat? To help with figuring out the proble, can you attach or email me your param file?

Best, Thejs

noller commented 6 years ago

Sure, will email you the param file in a second (it's the standard example.param that comes with MontePython, so there shouldn't be anything strange coming from there). When I run without update or without an input covmat everything works beautifully, so it's only when update is used that things break. Then, as soon as update is used and computes a covmat and bestfit (so the first time the convergence conditions for the chains in terms of R for the computation of the covmat are met), the immediate next call to CLASS breaks the run with the errors in the previous post.

brinckmann commented 6 years ago

Did you apply the fix to parameter scaling that I wrote for issue #106 ? I suspect it's related to this issue, as it also interferes with update.

I apologize for all of these bugfixes that have to be applied manually, they will soon be pushed to the public version.

noller commented 6 years ago

Great, that seems to have fixed it! I've applied your parameter scaling fix from #106 (I hadn't before) and just finished another test run. --update always worked fine in that run and there were no issues with inconsistent input parameters. I'll do a few more thorough tests next and will let you know should the problem re-appear, but it looks as though this was really the same issue as #106 in disguise and so the same fix sorts it all out.

Many thanks again and obviously no need to apologise at all - it's fantastic you guys have created and are maintaining the code for everyone else to use as well!

Best wishes, Johannes