Closed matteobachetti closed 1 year ago
Could you please attach the debug.log
I think this is could be caused by a likelihood plateau at the beginning of the run.
But I agree, this is not ideal behaviour. Probably I should add a maximum number of roots that is being expanded to.
I am not successful in reproducing your bug with the latest stingray repo and ultranest 3.5.7 from pypi
I also tried reverting the commit c140664223b1211c7730d0b375fc83f0757d41a3, or going back this version https://github.com/StingraySoftware/stingray/commit/8ad7272fcd2a155bd6d2eeabe68be58bdec92742
Please let me know how I can reproduce the runaway behaviour.
@JohannesBuchner I would also have gone to commit 8ad7272 of stingray. Up to that point, a simple pytest test_bexvar.py could have reproduced the result. Have you made any new modifications to Multinest that might have fixed the behavior?
No
We are experiencing this issue with EXOTIC, also, using UN 3.5.7.
To wit:
[ultranest] Widening roots to 6537217 live points (have 3268609 already) ... [ultranest] Sampling 3268608 live points from prior ... It does an initial sampling of the parameter space to estimate where to do a constrained search but the number of live points gets so large that it needs more initial samples than it usually takes to converge. ... I've been noticing some TESS light curves requiring over a million function calls when it only takes ~10,000 to converge. ...
This is perhaps okay for a single reduction but when running against massive data sets, it brings our applications to their knees. ⚔️
Version 3.5.6 seems to be working fine.
Discovered by @pearsonkyle in EXOTIC. Please contact him for more info.
Context
This happens when you have a large fraction of the prior parameter space with the same loglikelihood, i.e., a plateau. Such plateaus need to be handled in a special way in nested sampling, otherwise you get biases, as this paper https://arxiv.org/abs/2005.08602 pointed out. This paper https://arxiv.org/abs/2010.13884 discusses a strategy, which is implemented in ultranest. In particular, the live points need to be discarded together without replacement until the plateau is crossed. But this causes a reduction of the live points, and the subsequent run would have few live points available, making it both inefficient and probably also return poor posteriors. Hence the widening of the initial live point population to hopefully have a reasonable number remaining.
Solution
You can avoid this by defining a log-likelihood that does not have plateaus. Probably you are returning a low value when the parameters are problematic/unphysical. Instead, return something which increases towards where the good region is.
For example, let's say you have two parameters where the sum must be below 1. Replace this:
if params[0] + params[1] > 1:
return -1e300
with:
if params[0] + params[1] > 1:
return -1e300 * (params[0] + params[1])
Mitigation
Probably there are likelihoods where all values are identical. For example, a no-data case. Probably we should put in a limit to the widening as an additional parameter, with a clear warning and instructions how to improve things (like the above). Maybe warn at 100,000 and stop trying at 500,000?
Hi all, could you please have a look and test this pull request: https://github.com/JohannesBuchner/UltraNest/pull/96
If it is suitable and works (adds a useful print to the output, avoids infinite looping), I would merge it and make a new release.
This should work now. There are two unit tests, a warning when the run-away seems to occur, and here is how to configure it:
sampler.run(....,
widen_before_initial_plateau_num_warn=10000,
widen_before_initial_plateau_num_max=50000,
)
I created a release (3.6.0), please test it and let me know if it works for you.
Hi @JohannesBuchner, sorry for missing the previous message, and thanks for the change! I will update ultranest and let you know if the problem appears again.
Description
We made an implementation of Bexvar in Stingray some months ago, as you might remember. When testing ultranest-dependent code, we have a case where we feed it bad data and expect a warning. It used to just exit after a few seconds with gibberish results, which was pretty minor. Now, it starts an apparently infinite loop of widening more and more the roots, making the CI crash for timeout:
What I Did
I think that the only think I did was feeding non-integer counts to bexvar. Again, the purpose was failing graciously