MNGuenther / allesfitter

allesfitter is a convenient wrapper around the packages ellc (light curve and RV models), dynesty (static and dynamic nested sampling) emcee (Markov Chain Monte Carlo sampling) and celerite (Gaussian Process models).
MIT License
60 stars 36 forks source link

inf evidence does not reach precribed ns_tol #18

Closed jpdeleon closed 3 years ago

jpdeleon commented 3 years ago

I re-ran my old params+settings file (without any changes) using the latest allesfitter version but this time, the NS sampling does not converge at all.

The initial plots were produced look fine but I notice the logz value is always inf which is probably the reason why the sampling takes too long even if I use ns_tol=1 or so:

1249it [03:18,  3.17s/it, batch: 0 | bound: 1 | nc: 1355 | ncall: 125404 | eff(%):  0.992 | loglstar:   -inf <   -inf <    inf | logz:   -inf +/-    nan | dlogz:    inf >  1.000]

Can you prescribe a way to avoid inf evidence?

MNGuenther commented 3 years ago

Hi @jpdeleon, are you on allesfitter v1.1.2 and dynesty v1.0.1? If not, could you update to both and try again?

Could you email me a Dropbox (or the like) link to your folder? Then I can take a look at the params, settings and data files. maxgue at mit dot edu

The inf comes from initial live points settled in unphysical regions of the parameter space, which lead the model to result in NaN. Generally, smaller (physically possible) bounds will solve this. Normally, this jumps down to finite values by itself after a short while; strange that it gets stuck in your case.

ps: I would recommend to not change ns_tol. In any case, it only impacts the stopping argument for the run. And what you see here is the start of the run being inefficient (inf).

jpdeleon commented 3 years ago

Oh I checked with pip show allesfitter and got version 1.1.rc0 even though I git pull the most recent version (v.1.1.2) before running allesfitter. Previously, I installed allesfitter in development mode: pip -e allesfitter so I am not sure if the results above reflect the most recent version. In any case, I re-installed allesfitter and verified the pip show output is now v.1.1.2.

My concern above was the new run took more than 2 days (or more) with 40 cores whereas it finished within a day using exactly the same settings (also dynesty v.1.0.1). Note that the model includes 9 photometry datasets with 56 parameters.

Anyway, I see some improvement where the evidence is now finite after few more iterations:

1696it [18:58,  2.22s/it, batch: 0 | bound: 6 | nc: 1489 | ncall: 830280 | eff(%):  0.204 | loglstar:   -inf < -237008.274 <    inf | logz: -237018.570 +/-    nan | dlogz: 251341.361 > 0.100]

What could perhaps explain the non-convergence behavior in v.1.1.rc0?

MNGuenther commented 3 years ago

Hey @jpdeleon to be sure, I cannot think of anything. Nothing has changed in allesfitter's core since then, only a few cosmetic things. I would have guessed maybe it was a change in dynesty, but since you always had 1.0.1, that cannot be the case. So my best guess is: maybe the random setting of the live points at the start of your last run just happened to put a few of them into corners where they got stuck.

MNGuenther commented 3 years ago

Btw, if you want to (i) significantly decrease your runtime and (ii) avoid any inf values with Nested Sampling*, consider making your priors as tight and physically motivated as possible (yet without restricting the posterior). For example, if you know a planet has a rough period of 1.6 days and transits, there is no point in using Nested Sampling priors like a period uniform in [0,1000] and a cos(i) uniform in [0,1]. Instead, consider choosing something like period uniform in [1,2] and cos(i) uniform in [0,0.2].

*1-2 days on 40 cores sounds very long to me. Such a long runtime and the effect of seeing "inf" values at the start makes me suspect your prior volume is too large and includes "unphysical" areas.

jpdeleon commented 3 years ago

Thanks for responding! I'm doing the prescribed tight priors. I'm just wondering why this time, my run does not converge. In fact I get a new error as follows:

76114it [1:40:05, 12.67it/s, batch: 0 | bound: 1050 | nc: 25 | ncall: 1988983 | eff(%):  3.826 | loglstar:   -inf < 17050.222 <    inf | logz: 16901.654 +/-    nan | dlogz:  1.131 >  0.010]
Traceback (most recent call last):
  File "run.py", line 6, in <module>                                 
    allesfitter.ns_fit('.')       
  File "/mnt/data1/jerome/github/research/transit/allesfitter/allesfitter/nested_sampling.py", line 148, in ns_fit
    sampler.run_nested(nlive_init=nlive, dlogz_init=tol, print_progress=config.BASEMENT.settings['print_progress'])
  File "/ut3/jerome/miniconda3/envs/py3/lib/python3.8/site-packages/dynesty/dynamicsampler.py", line 1619, in run_nested
    for results in self.sample_initial(nlive=nlive_init,
  File "/ut3/jerome/miniconda3/envs/py3/lib/python3.8/site-packages/dynesty/dynamicsampler.py", line 838, in sample_initial
    for it, results in enumerate(self.sampler.sample(maxiter=maxiter,
  File "/ut3/jerome/miniconda3/envs/py3/lib/python3.8/site-packages/dynesty/sampler.py", line 758, in sample
    bound = self.update(pointvol)
  File "/ut3/jerome/miniconda3/envs/py3/lib/python3.8/site-packages/dynesty/nestedsamplers.py", line 358, in update
    self.ell.update(self.live_u, pointvol=pointvol, rstate=self.rstate,
  File "/ut3/jerome/miniconda3/envs/py3/lib/python3.8/site-packages/dynesty/bounding.py", line 302, in update
    ell = bounding_ellipsoid(points, pointvol=pointvol)
  File "/ut3/jerome/miniconda3/envs/py3/lib/python3.8/site-packages/dynesty/bounding.py", line 1363, in bounding_ellipsoid
    ell = Ellipsoid(ctr, covar)
  File "/ut3/jerome/miniconda3/envs/py3/lib/python3.8/site-packages/dynesty/bounding.py", line 167, in __init__
    raise ValueError("The input precision matrix defining the "
ValueError: The input precision matrix defining the ellipsoid [[ 4.80563460e-02 -5.68339612e-02  4.31396955e-02 ... -1.75587345e-03
  -2.85438611e-03 -1.69707597e-03]
 [-5.68339612e-02  1.02785552e-01 -3.89257495e-02 ...  1.79047843e-03
   6.33339687e-03  4.91121664e-03]
 [ 4.31396955e-02 -3.89257495e-02  6.52474755e+00 ...  1.53642572e-03
  -6.15717639e-02 -8.84509100e-02]
 ...
 [-1.75587345e-03  1.79047843e-03  1.53642572e-03 ...  6.71490183e-02
   7.49402017e-03  1.12415393e-02]
 [-2.85438611e-03  6.33339687e-03 -6.15717639e-02 ...  7.49402017e-03
   2.31797194e-02  2.60208027e-02]
 [-1.69707597e-03  4.91121664e-03 -8.84509100e-02 ...  1.12415393e-02
   2.60208027e-02  5.05455648e-02]] is apparently singular with l=[-2.46371249e-16  2.69716736e-14  5.06432618e-05  9.93243874e-05
  1.25936654e-04  1.68448214e-04  1.73988622e-04  2.04321326e-04
  2.57996108e-04  2.92414047e-04  6.72881253e-04  6.95723895e-04
  7.73363193e-04  8.67140295e-04  1.06473541e-03  1.31389471e-03
  2.17192181e-03  3.36002345e-03  3.88544317e-03  5.40573340e-03
  5.69423262e-03  7.72198828e-03  8.72878830e-03  9.02197644e-03
  1.03212170e-02  1.13231994e-02  1.59935638e-02  3.24332258e-02
  4.02302412e-02  5.16075529e-02  5.67025425e-02  8.13790140e-02
  1.13438216e-01  1.41132327e-01  2.81118614e-01  5.75227845e-01
  7.05361846e-01  7.32004451e-01  7.63498632e-01  9.45740058e-01
  1.29092006e+00  1.63411158e+00  2.26533301e+00  2.39626413e+00
  2.76762738e+00  4.03914656e+00  4.35171554e+00  4.82586293e+00
  5.23481744e+00  5.69170467e+00  6.02229796e+00  6.75615767e+00
  7.52949050e+00  7.77184876e+00  8.93712171e+00] and v=[[-2.27234613e-08 -1.50440586e-07  1.45899775e-02 ...  2.48121323e-03
  -8.76723285e-03  1.03930475e-02]

which is the first time I encountered.

MNGuenther commented 3 years ago

Hi @jpdeleon can you email me your allesfit folder with all files? It might still be something off in params.csv and setting.csv. If I find nothing, we'll have to pass this on to the dynesty team, since it is a dynesty error (https://github.com/joshspeagle/dynesty).

jpdeleon commented 3 years ago

Thanks for your continued support! I confirmed there was a problem with the bounds in the params.csv. I also simplified the models starting with few data sets and allesfitter worked like charm. There is no bug after all.