Closed tilmantroester closed 5 years ago
@tilmantroester thank you for pointing this out!
This is error is caused by trying to start the dynamic nested sampling run by resuming the initial exploratory run whenever the optimal start point for the dynamic run is not sampling from the whole prior. However we do not save a resume file every step, so it throws an error if it tries to resume before the first resume file was saved. I fixed this in 0dd8636c9865632f1d11527193eda3254685d5ee - now the dynamic nested sampling starts from sampling the whole prior when there is no available resume file close enough to the start.
Let me know if any problems remain after this fix, and if not I will close the issue.
Apologies for the delay getting back to this. It was working on a simple toy model but I've now ran it at scale where I got the same error message again:
Traceback (most recent call last):
File "/home/ttroester/Codes/dyPolyChord/dyPolyChord/run_dynamic_ns.py", line 193, in run_dypolychord
dynamic_goal=dynamic_goal)
File "/home/ttroester/Codes/dyPolyChord/dyPolyChord/output_processing.py", line 114, in process_dypolychord_run
run = combine_resumed_dyn_run(init, dyn, dyn_info['resume_ndead'])
File "/home/ttroester/Codes/dyPolyChord/dyPolyChord/output_processing.py", line 203, in combine_resumed_dyn_run
nestcheck.ns_run_utils.get_run_threads(init),
File "/home/ttroester/Codes/nestcheck/nestcheck/ns_run_utils.py", line 152, in get_run_threads
samples = array_given_run(ns_run)
File "/home/ttroester/Codes/nestcheck/nestcheck/ns_run_utils.py", line 65, in array_given_run
samples[-1, 2] = -1 # nlive drops to zero after final point
IndexError: index -1 is out of bounds for axis 0 with size 0
@tilmantroester this actually looks like a different problem to me. I think what is happening is that one of the the initial ("init") or dynamic ("dyn") runs contains zero samples, which throws an error when they are split into threads here:
File "/home/ttroester/Codes/dyPolyChord/dyPolyChord/output_processing.py", line 203, in combine_resumed_dyn_run
nestcheck.ns_run_utils.get_run_threads(init),
Please can you provide an example I can use to replicate the error?
Otherwise I suggest adding some print statements printing properties of "init" and "dyn" before line 200 of output_processing.py so you can check what it is about these runs which means they cannot be split into threads.
This happens in a fairly complex pipeline, getting a simple example to replicate is going to be difficult. Since I run this on a large number of MPI workers, each of which only gets one CPU, splitting into threads probably doesn't do much at best and might upset the scheduler at worst. Is it possible to disable running on multiple threads?
Are there constraints on n_init
, e.g., that it needs to be larger than the number of dimensions of the parameter space?
"threads" in get_run_threads
refers to splitting up the data into single live point runs ("threads") after PolyChord has finished sampling (not to multi-threading of the computer process).
The new error is occuring after both the initial and dynamic runs have finished and looks to me like either the initial or the dynamic run has no samples in it. I expect the dynamic run has no samples in it (if the initial one had no samples then I think it would have thrown an error earlier). If so the problem is all your allotted samples are being used on the initial run, so you need the total number of samples available to the dynamic run to be bigger - i.e. increase nlive_const
or max_ndead
(whichever you use) relative to n_init
. What values are you using for these? There is not much I can say without being able to replicate the issue.
In general having n_init
greater than the number of dimensions is a good idea, although if my theory is correct you will also need to increase (nlive_const
or max_ndead
) by a larger fraction to avoid this error.
Increasing n_init
and nlive_const
indeed seems to prevent the error from occurring.
Great! I will close this issue. If this occurs again then try printing out how many points there are in the initial and dynamic runs then increasing n_init
and nlive_const
to ensure this is more than zero.
I've been working on writing up a CosmoSIS interface for dypolychord but I've been running into problems. I managed to distill it down to the toy problem of a 2d isotropic Gaussian (based on the example in the docs):
The error is (most of the time, it's a bit random if it occurs):
Sometime it finishes but gives warnings like