EPRV3EvidenceChallenge / Inputs

Input Data & Model for the EPRV3 Evidence Challenge - Start Here
MIT License
11 stars 10 forks source link

Finding the true evidence for n_planets=0 #10

Open JohannesBuchner opened 6 years ago

JohannesBuchner commented 6 years ago

Very fine grids or similar methods should allow us to find the true value for n_planets=0, a merely 2d parameter space with a single mode.

To rank methods by their quality, it is important to know the true evidence. If we do not know the true value, we can only look at the scatter between methods; poor-quality methods can introduce scatter and make all methods look unreliable. If we have the true value we can then understand the biases and scatter of various methods.

I suggest we report approaches here?

JohannesBuchner commented 6 years ago

I applied my Variational Bayes with Importance Sampling, but increased the number of samples by a factor of 10, and increased the number of effective samples required for termination to 10000. I get a sampling efficiency (as measured in effective samples per samples drawn) of 99%, indicating that the posterior is approximated very well by the mixture. Only for data set 0004, I get an efficiency of 50% (still very high).

I also ran the cubature algorithms of the cuba library, namely Cuhre, Divonne, Suave and Vegas (see http://www.feynarts.de/cuba/), to very small tolerances (epsrel=0.005). These are very reliable for low-dimensional problems but scale poorly to very large spaces. I can go into more detail on the parameters chosen if needed. In any case, there seems to be agreement mostly, with very small uncertainties:

evidences_0001.txt

log10Z log10Zerr Method
-211.977357779 0.00351316773677 cuba-Cuhre
-211.977630699 0.00434258435705 cuba-Divonne
-212.028544419 0.00427695687894 cuba-Suave
-211.977302743 0.000210726497178 importance-sampling-full-long

Agreement within +-0.03. I would recommend the reference value -211.9775.

evidences_0002.txt

log10Z log10Zerr Method
-197.110369955 0.00395487213517 cuba-Cuhre
-197.109758588 0.000639996300822 cuba-Divonne(failed)
-197.110109744 0.00431386644555 cuba-Suave
-197.109772079 0.000140346950409 importance-sampling-full-long

Agreement within +-0.01. I would recommend the reference value -197.11.

evidences_0003.txt

log10Z log10Zerr Method
-169.65294809 0.0025970008633 cuba-Cuhre
-169.644537626 0.00138858028697 cuba-Divonne
-169.645347864 0.00432179836762 cuba-Suave
-169.644455886 0.000185403762282 importance-sampling-full-long

Agreement within +-0.01. I would recommend the reference value -169.6445.

evidences_0004.txt

log10Z log10Zerr Method
-161.622371745 0.00378246265907 cuba-Cuhre
-161.62241106 0.00130178504753 cuba-Divonne
-161.624798742 0.0042567955609 cuba-Suave
-161.621546044 0.00239764431723 importance-sampling-full-long

Agreement within +-0.002. I would recommend the reference value -161.622.

evidences_0005.txt

log10Z log10Zerr Method
-167.052449146 0.00408837606711 cuba-Cuhre
-167.026343027 0.00129086212816 cuba-Divonne
-167.026455572 0.00424903903465 cuba-Suave
-167.026239181 0.000155710100652 importance-sampling-full-long

Agreement within +-0.003. I would recommend the reference value -167.026.

evidences_0006.txt

log10Z log10Zerr Method
-256.251207841 nan cuba-Cuhre
-179.855313841 0.00169102861505 cuba-Divonne
-179.854873643 0.00383504863613 cuba-Suave
-179.855266453 0.000174718115612 importance-sampling-full-long

Cuhre seems to have failed here, finding a small evidence and its error is larger than the Z estimate, therefore I computed a NaN here in the error. Probably it did not find the peak.

Among the others, there is agreement within +-0.001. I would recommend the reference value -179.855.

Vegas did not terminate yet. I will update once I have its results.

This relies of course on the likelihood being correct (issue #9).

JohannesBuchner commented 6 years ago

I should also add that to avoid float underruns, which can occur when algorithms use the likelihood instead of the loglikelihood, I added a constant to the loglikelihood function (500 in ln). At the end, I subtract that offset from the final log evidence.

eford commented 6 years ago

Thanks, Johannes.

FWIW, I applied my Laplace approximation for the integral over the RV offset combined with either Gauss-Legendre integration (dialing up the number of samples over sigma_j to 2000) or an adaptive Gauss-Kronrod quadrature (with epstol=1e-12) for the integral over over sigma_j. I get values of log_10(evidence for the zero planet model) that are similar to your, but somewhat different.

dataset Gauss-Legendre Gauss-Kronrod

1: -211.5963934248423 -211.59639342484226

2: -196.73647042307297 -196.736470423073

3: -169.34748531702974 -169.34748531702974

4: -161.43469549088118 -161.43469549088118

5: -166.74276613904826 -166.74276613904826

6: -179.52801098509198 -179.52801098509198

Vinesh, Joao, Rodrigo, James, and any others who can compute these quickly, could you run your algorithms longer than you normally would to see if they converge to values near these for the zero planet models? If you can send your results before 8pm EDT Tuesday, then we'll try to compare them Tuesday night, so we can discuss during Wednesday's breakout session.

Thanks, Eric

On Mon, Aug 14, 2017 at 6:55 PM, Johannes Buchner notifications@github.com wrote:

I applied my Variational Bayes with Importance Sampling, but increased the number of samples by a factor of 10, and increased the number of effective samples required for termination to 10000. I get a sampling efficiency (as measured in effective samples per samples drawn) of 99%, indicating that the posterior is approximated very well by the mixture.

I also ran the cubature algorithms of the cuba library, namely Cuhre, Divonne, Suave and Vegas (see http://www.feynarts.de/cuba/), to very small tolerances (epsrel=0.005). These are very reliable for low-dimensional problems but scale poorly to very large spaces. I can go into more detail on the parameters chosen if needed. In any case, there seems to be agreement mostly, with very small uncertainties: evidences_0001.txt log10Z log10Zerr Method -211.977357779 0.00351316773677 cuba-Cuhre -211.977630699 0.00434258435705 cuba-Divonne -212.028544419 0.00427695687894 cuba-Suave -211.977302743 0.000210726497178 importance-sampling-full-long

Agreement within +-0.03. I would recommend the reference value -211.9775 . evidences_0002.txt log10Z log10Zerr Method -197.110369955 0.00395487213517 cuba-Cuhre -197.109758588 0.000639996300822 cuba-Divonne(failed) -197.110109744 0.00431386644555 cuba-Suave -197.109772079 0.000140346950409 importance-sampling-full-long

Agreement within +-0.01. I would recommend the reference value -197.11. evidences_0003.txt log10Z log10Zerr Method -169.65294809 0.0025970008633 cuba-Cuhre -169.644537626 0.00138858028697 cuba-Divonne -169.645347864 0.00432179836762 cuba-Suave -169.644455886 0.000185403762282 importance-sampling-full-long

Agreement within +-0.01. I would recommend the reference value -169.6445 . evidences_0004.txt log10Z log10Zerr Method -161.622371745 0.00378246265907 cuba-Cuhre -161.62241106 0.00130178504753 cuba-Divonne -161.624798742 0.0042567955609 cuba-Suave -161.621546044 0.00239764431723 importance-sampling-full-long

Agreement within +-0.002. I would recommend the reference value -161.622 . evidences_0005.txt log10Z log10Zerr Method -167.052449146 0.00408837606711 cuba-Cuhre -167.026343027 0.00129086212816 cuba-Divonne -167.026455572 0.00424903903465 cuba-Suave -167.026239181 0.000155710100652 importance-sampling-full-long

Agreement within +-0.003. I would recommend the reference value -167.026 . evidences_0006.txt log10Z log10Zerr Method -256.251207841 nan cuba-Cuhre -179.855313841 0.00169102861505 cuba-Divonne -179.854873643 0.00383504863613 cuba-Suave -179.855266453 0.000174718115612 importance-sampling-full-long

Cuhre seems to have failed here, finding a small evidence and its error is larger than the Z estimate, therefore I computed a NaN here in the error. Probably it did not find the peak.

Among the others, there is agreement within +-0.001. I would recommend the reference value -179.855.

Vegas did not terminate yet. I will update once I have its results.

This relies of course on the likelihood being correct (issue #9 https://github.com/EPRV3EvidenceChallenge/Inputs/issues/9).

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/EPRV3EvidenceChallenge/Inputs/issues/10#issuecomment-322332071, or mute the thread https://github.com/notifications/unsubscribe-auth/ABQywv6SlRHibf1Uy0MLR5fXGyiPTTIyks5sYNBggaJpZM4O21v7 .

-- Eric Ford Professor of Astronomy & Astrophysics Center for Exoplanets & Habitable Worlds Center for Astrostatistics Institute for CyberScience Penn State Astrobiology Research Center Pennsylvania State University

JohannesBuchner commented 6 years ago

Hmm, strange. We should really double-check the likelihoods then. Your integrals include the prior, right? Mine are always integrals over the unit cube, with prior transforms, so they do.

JohannesBuchner commented 6 years ago

pinging @j-faria @vineshrajpaul @exord , please see @eford 's comment above.

vmaguirerajpaul commented 6 years ago

Thanks - shall try to produce new evidence estimates for 0-planet models before 8pm EDT tonight

JohannesBuchner commented 6 years ago

cuba-Vegas failed in the sense that it did not produce anything reasonable within 2e6 likelihood evaluations (the limit I set). The one time cuba-Cuhre failed, it ran against the same issue. I could run them longer, if needed, but at the moment I believe the values above.

vmaguirerajpaul commented 6 years ago

I've only had time to do a single run on my laptop for each of the 6 data sets - CPU time about 1 hour per data set, so on par with the runs I did for the original evidence estimates I submitted. However, I believe I have now fixed a bug I'd identified in my original code - hence my previous comment about something "going wrong with sampling from the priors" - so am inclined to trust these results a bit more. (The problem related to trying to perform a Cholesky decomposition on matrices that were not quite positive definite.)

Anyway, given more time I should be able to perform multiple, longer runs of the MCMC code, in order to produce more accurate and precise evidence and evidence error estimates. (I expect to see more sizeable changes in the estimates for the models with more parameters.) Certainly I can do this before the Sept 14 deadline. For now, though, here are the log10 evidence estimates for the 0-planet models from my bug-corrected code:

1: -211.97 ± 0.1

2: -197.05 ± 0.1

3: -169.42 ± 0.1

4: -161.25 ± 0.1

5: -166.74 ± 0.1

6:-179.71 ± 0.1

I guess this MCMC/nested sampling approach is not the most efficient way to compute a 2D integral, but at least the exercise has helped me to fix a bug in the same code I use for the higher-order models...

exord commented 6 years ago

Sorry to come in this late. I've just run my importance sampling integration with 10'000 samples, this are my results for dataset #1. Errorbars estimated by Monte Carlo repetitions:

-211.97774486,-211.977767,-211.97832506,-211.97804632,-211.97749236,-211.97723798

Reported values are mode, median, and 2nd-, 16th-, 84th-, and 98th-percentiles. So

log10z = -211.97774 +/- 5.5e-4

I was initially a bit worried about such a small errorbar, but comparing with @JohannesBuchner 's values above, I see we differ by only 3.8e-4. Hooray.

exord commented 6 years ago

By the way, the estimator using 5000 samples

-211.978653525,-211.978669,-211.979744,-211.97916348,-211.978239,-211.97779196
log10z = -211.97867 +/-  0.00092

Which differs from the values above by 1.3e-3. Not as good, but not bad either. Maybe something of the known bias in the Perrakis estimator remains with 5000 samples.

vmaguirerajpaul commented 6 years ago

I think I have finally fixed a long-standing bug in my code so will try to post new & more precise evidence estimates ASAP. Better late than never, hopefully!

exord commented 6 years ago

Quick update. I also find good agreement with dataset #2 using @JohannesBuchner 's reference value.

vmaguirerajpaul commented 6 years ago

Updated 0-planet log10 evidence estimates, now with bug-fixed code (hurrah!) and slightly longer MCMC runs:

  1. -211.98 ± 0.01
  2. -197.12 ± 0.02
  3. -169.638 ± 0.004
  4. -161.623 ± 0.009
  5. -167.00 ± 0.01
  6. -179.862 ± 0.008

I'll do the 1-planet tests next.

PS: I'm "in between institutions" at the moment, and am travelling/preparing for my PhD viva/etc., so for the time being have to run these tests on my personal laptop. Apologies, therefore, if I've been terribly slow producing results.

eford commented 6 years ago

Excellent news. No need to appologize. At this point, I'd suggest doing only enough to make sure you're happy with what your code is doing, rather than running all the datasets with all the models (particularly the 2 and 3 planet models). I'll try to construct an email/doodle poll to survey people about the various proposed changes to priors soon, so we can finalize all the numbers for the final analysis next week.

Thanks!

On Fri, Aug 18, 2017 at 4:36 PM, vineshrajpaul notifications@github.com wrote:

Updated 0-planet log10 evidence estimates, now with bug-fixed code (hurrah!) and slightly longer MCMC runs:

  1. -211.98 ± 0.01
  2. -197.12 ± 0.02
  3. -169.638 ± 0.004
  4. -161.623 ± 0.009
  5. -167.00 ± 0.01
  6. -179.862 ± 0.008

I'll do the 1-planet tests next.

PS: I'm "in between institutions" at the moment, and am travelling/preparing for my PhD viva/etc., so for the time being have to run these simulations on my personal laptop. Apologies, therefore, if I've been terribly slow producing results.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/EPRV3EvidenceChallenge/Inputs/issues/10#issuecomment-323454827, or mute the thread https://github.com/notifications/unsubscribe-auth/ABQywlB8yNzg3But3sMryKD0YaSct428ks5sZfWwgaJpZM4O21v7 .

-- Eric Ford Professor of Astronomy & Astrophysics Center for Exoplanets & Habitable Worlds Center for Astrostatistics Institute for CyberScience Penn State Astrobiology Research Center Pennsylvania State University