Closed SmirnGreg closed 3 years ago
Do you use vectorized=True?
If so, can you double check that your transform function returns a 2d array when called with one live point, i.e.,
transform(np.random.uniform(size=(1, ndim)).shape == (1, nparams)
It seems recv_samplesv in this line https://github.com/JohannesBuchner/UltraNest/blob/master/ultranest/integrator.py#L1356 has inconsistent dimensionality (can you print it?).
vectorized=False
as in default call of ReactiveNestedSampler
class UltraNestFitter():
...
def rescale(self, cube):
params = np.empty_like(cube)
for i, parameter in enumerate(self.parameters):
params[i] = cube[i] * (parameter.max - parameter.min) + parameter.min
return params
It is a method which is passed as transform=self.rescale
. It is adopted from
def my_prior_transform(cube):
params = cube.copy()
lo = 400
hi = 800
params[0] = cube[0] * (hi - lo) + lo
From your examples.
This is a corresponding test. Transform works for both 1d and 2d arrays. Should it return 2d arrays when it is called from one point? I missed that.
def test_transform_function():
parameters = [
Parameter("a", min=10, max=20),
Parameter("b", min=-5, max=15)
]
fitter = UltraNestFitter(
lnprob=lambda params: -((params[0] - 15) ** 2 + params[1] ** 2),
parameters=parameters,
log_dir="transform_test"
)
assert pytest.approx(fitter.transform(np.array([0.5, 0.2]))) == [15, -1]
assert pytest.approx(fitter.transform(np.array([0.4, 0.8]))) == [14, 11]
assert pytest.approx(fitter.transform(np.array([[0.5, 0.4], [0.2, 0.8]]))) == np.array([[15, 14], [-1, 11]])
assert pytest.approx(fitter.transform([0.5, 0.2])) == [15, -1]
assert fitter.transform(np.random.uniform(size=(2, 1))).shape == (2, 1)
assert fitter.transform(np.random.uniform(size=(1, 2))).shape == (1, 2)
In this test, the last line will fail with
def rescale(self, cube):
params = np.empty_like(cube)
for i, parameter in enumerate(self.parameters):
> params[i] = cube[i] * (parameter.max - parameter.min) + parameter.min
E IndexError: index 1 is out of bounds for axis 0 with size 1
No, that's fine, the function is vectorized (with a loop) here: https://github.com/JohannesBuchner/UltraNest/blob/master/ultranest/integrator.py#L459
That's a really odd bug. can you print recv_samplesv, num_live_points_todo, and other variables?
I tried to reproduce with:
PYTHONPATH=. mpiexec -np 12 python3 examples/testloggamma.py --reactive --x_dim 2 --num_live_points=81
which goes into this part of the code:
[ultranest] Widening roots to 160 live points (have 81 already) ...
[ultranest] Sampling 40 live points from prior ...
I was not able to reproduce the crash however.
The new run crashed immediately.
I added to integrator.py around line 1350
if self.use_mpi:
recv_samples = self.comm.gather(active_u, root=0)
recv_samplesv = self.comm.gather(active_v, root=0)
recv_likes = self.comm.gather(active_logl, root=0)
recv_samples = self.comm.bcast(recv_samples, root=0)
recv_samplesv = self.comm.bcast(recv_samplesv, root=0)
recv_likes = self.comm.bcast(recv_likes, root=0)
with open(f"log_{self.mpi_rank}.txt", 'a') as fff:
print(f"{self.mpi_rank}: TEST #44:\nrecv_samples=%s\nrecv_samplesv=%s\nu=%s\nv=%s\n\n" % (recv_samples, recv_samplesv, active_u, active_v), file=fff)
print(num_live_points_todo, num_live_points_missing, "\n\n\n\n\n\n\n", file=fff)
active_u = np.concatenate(recv_samples, axis=0)
active_v = np.concatenate(recv_samplesv, axis=0)
active_logl = np.concatenate(recv_likes, axis=0)
I attach logs from rank0, rank1, and rank1000 > min_num_living_points log_0.txt log_1.txt log_1000.txt
When I had just 2 nodes and 80 cores, this worked fine. Seems that the issue happens when the core has nothing to do and broadcasts an empty array..? I am new to mpi and still suffer reading the mpi code 😕 log_0.txt log_1.txt log_79.txt
OK, I see what the problem is.
Some nodes have one or more points and call for them, and they produce 2d arrays for active_v, as expected
Some nodes receive no work. They still call transform(), which is a vectorized wrapper call of the user-defined tranform function. It is defined here: https://github.com/JohannesBuchner/UltraNest/blob/6461f108e4527b36ad461c245b62d919da6429d5/ultranest/utils.py#L127 That function does a loop over the proposed points, calls the user transform, and returns the list of results, transformed into a numpy array.
The numpy array is 2d if there is one point or more, but there is a problem when this function is called with an empty list of points. Then np.asarray cannot figure out the dimensionality it is supposed to produce, so it makes np.array([], dtype=float)
.
Here are two solutions I can think of:
a) we make vectorize() aware what array shape should be returned. That would be vectorize().reshape((-1, nparams))
for where self.transform is assigned.
b) we change the code here before concatenate to strip away empty arrays. That is, active_v = np.concatenate([v for v in recv_samplesv if v.size > 0], axis=0)
This scenario of being short of points is not occuring in other places of the code base.
I'm slightly leaning towards b).
active_v = np.concatenate([v for v in recv_samplesv if v], axis=0)
is more pythonic, as empty collections are falsy and non-empty are truthy. I will check it with this patch now.
PS I forgot numpy is not pythonic =/
In [1]: import numpy as np
In [2]: a = np.array([1,2,3])
In [3]: bool(a)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-3-39666bfb5ad8> in <module>
----> 1 bool(a)
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
IIRC numpy will complain that the truth of an array with multiple values is undefined.
Actually, I think a better patch would be:
num_live_points_todo = ...
if num_live_points_todo > 0:
active_u = np.random.uniform(size=(num_live_points_todo, self.x_dim))
active_v = self.transform(active_u)
active_logl = self.loglike(active_v)
else:
active_u = np.empty((0, self.x_dim))
active_v = np.empty((0, self.num_params))
active_logl = np.empty((0,))
I imagine there could be transforms and loglikelihoods that involve some setup cost, and the above avoids calling them.
While the list comprehension would be pythonic and nice, doing such loops in python can be slow if there are thousands of arrays involved. The above makes my hack on the concatenate input obsolete.
Will you commit this yourself? I agree this is better.
I still cannot reproduce the bug, so if you can make a (new) PR and test that it solves the problem, that would be better.
closed via #45
Description
With the recent version 3.3.2, I am getting an error when the number of living points changes.
What I Did
(I am not sure I recreated them in the correct order)
Here is the log as I ran it first for half an hour, then the process was killed by the slurm, and then I resumed. It started fine, but then crashed when the number of living points changed, and crashed again the same way when I restarted.
Same happened with 0190fcce version (after #43 was merged)
I will try on 3.3.0 as well, but I will have to limit the number of nodes to avoid #42 bug, so it will take longer...