Closed ikhebgeenaccount closed 1 year ago
what ultranest does to store to hdf5 is essentially equivalent to something like:
import h5py
import numpy as np
import time
ncols = 10
filepath = 'ultranest_output/K_-33.5/run3/results/points.hdf5'
fileobj = h5py.File(filepath, mode='w')
fileobj.create_dataset(
'points', dtype=float,
shape=(0, ncols), maxshape=(None, self.ncols))
nrows = 1
while True:
fileobj['points'].resize(nrows + 1, axis=0)
fileobj['points'][self.nrows,:] = np.random.uniform(ncols)
fileobj.attrs['ncalls'] = nrows
nrows += 1
time.sleep(1)
It looks like the (virtual) file system is unstable, or there is some issue with hdf5 reads. Maybe you can play a bit with the script above, vary the intensity of writes and the row sizes (ncol), and see if you can reproduce the bug outside ultranest?
You get a segfault in addition to the OSError, not sure which is first?
When you run ultranest, you just run one process? I just want to make sure they do not work on the same file. If you use MPI, you need to install mpi4py.
Thanks for your quick response.
I played around with the script you provided and went up to ncols=100000
and time.sleep(.0001)
and I was unable to reproduce the bug. My initial guess was the same as yours: that it is probably an issue with the server. I went to the IT staff, who could not find an issue with the server (unfortunately for me).
When I run ultranest, I do run just one process. No other processes are using the points.hdf5 file.
Regarding the segfault, it seems like the segfault happens, causing the OSError to be thrown. So perhaps it's an h5py issue rather than an ultranest one.
I have tried running it on a different machine - and it's working now. So I suspect that it is an issue with the server.
Description
I am fitting a model using ultranest, using a ReactiveNestedSampler.
What I Did
This is the code I run:
I have verified that the likelihood and prior give the right values. The max_ncalls is set because I run this for several different measurements, some of which I know will not converge (or at least not in any reasonable time).
ultranest does run for a while, showing the standard output (these are the last two loggings before the crash):
But at some point I get the following error:
The time it takes to crash varies between 1 and 4 minutes.
I have updated h5py to 3.9.0 (its latest version), but the problem persisted. I run the Python script on a virtual desktop server within a virtual environment. Anything that might help me fix this is much appreciated, thanks!