JohannesBuchner / BXA

Bayesian X-ray analysis (nested sampling for Xspec and Sherpa)
https://johannesbuchner.github.io/BXA/
GNU General Public License v3.0
57 stars 19 forks source link

mpi4py returns a BlockingIOError/OSError: Unable to create file #36

Closed jpbreuer closed 6 months ago

jpbreuer commented 2 years ago

Description

While attempting to parallelize BXA with mpi, h5py file is created but locked. After following recommendation in previous (closed) bxa issue thread here, and attempting to reinstall all dependencies, problem persists, but with new error.

I read many forums regarding the errors, and they have recommended reinstalling dependencies, it seems as though the h5py file is corrupted while being created.

What I Did

Old error:

Traceback (most recent call last):
  File "/home/jpbreuer/Scripts/bxa_test.py", line 373, in <module>
    results = solver.run(resume=True)
  File "/home/jpbreuer/.local/lib/python3.9/site-packages/bxa/xspec/solver.py", line 188, in run
    self.results = solve(
  File "/home/jpbreuer/.local/lib/python3.9/site-packages/ultranest/solvecompat.py", line 55, in pymultinest_solve_compat
    sampler = ReactiveNestedSampler(
  File "/home/jpbreuer/.local/lib/python3.9/site-packages/ultranest/integrator.py", line 1077, in __init__
    self.pointstore = HDF5PointStore(storage_filename, storage_num_cols, mode='a' if resume else 'w')
  File "/home/jpbreuer/.local/lib/python3.9/site-packages/ultranest/store.py", line 187, in __init__
    self.fileobj = h5py.File(filepath, **h5_file_args)
  File "/home/jpbreuer/.local/lib/python3.9/site-packages/h5py/_hl/files.py", line 507, in __init__
    fid = make_fid(name, mode, userblock_size, fapl, fcpl, swmr=swmr)
  File "/home/jpbreuer/.local/lib/python3.9/site-packages/h5py/_hl/files.py", line 232, in make_fid
    fid = h5f.open(name, h5f.ACC_RDWR, fapl=fapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5f.pyx", line 106, in h5py.h5f.open
BlockingIOError: [Errno 11] Unable to open file (unable to lock file, errno = 11, error message = 'Resource temporarily unavailable')

Updated error:

Traceback (most recent call last):
  File "/home/jpbreuer/Scripts/bxa_test.py", line 128, in <module>
    results = solver.run(resume=True)
  File "/usr/local/lib/python3.9/dist-packages/bxa/xspec/solver.py", line 188, in run
    self.results = solve(
  File "/usr/local/lib/python3.9/dist-packages/ultranest/solvecompat.py", line 55, in pymultinest_solve_compat
    sampler = ReactiveNestedSampler(
  File "/usr/local/lib/python3.9/dist-packages/ultranest/integrator.py", line 1077, in __init__
    self.pointstore = HDF5PointStore(storage_filename, storage_num_cols, mode='a' if resume else 'w')
  File "/usr/local/lib/python3.9/dist-packages/ultranest/store.py", line 187, in __init__
    self.fileobj = h5py.File(filepath, **h5_file_args)
  File "/usr/lib/python3/dist-packages/h5py/_debian_h5py_serial/_hl/files.py", line 387, in __init__
    fid = make_fid(name, mode, userblock_size,
  File "/usr/lib/python3/dist-packages/h5py/_debian_h5py_serial/_hl/files.py", line 187, in make_fid
    fid = h5f.create(name, h5f.ACC_EXCL, fapl=fapl, fcpl=fcpl)
  File "h5py/_debian_h5py_serial/_objects.pyx", line 54, in h5py._debian_h5py_serial._objects.with_phil.wrapper
  File "h5py/_debian_h5py_serial/_objects.pyx", line 55, in h5py._debian_h5py_serial._objects.with_phil.wrapper
  File "h5py/_debian_h5py_serial/h5f.pyx", line 108, in h5py._debian_h5py_serial.h5f.create
OSError: Unable to create file (unable to open file: name = 'bxatest/results/points.hdf5', errno = 17, error message = 'File exists', flags = 15, o_flags = c2)
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/h5py/_debian_h5py_serial/_hl/files.py", line 185, in make_fid
    fid = h5f.open(name, h5f.ACC_RDWR, fapl=fapl)
  File "h5py/_debian_h5py_serial/_objects.pyx", line 54, in h5py._debian_h5py_serial._objects.with_phil.wrapper
  File "h5py/_debian_h5py_serial/_objects.pyx", line 55, in h5py._debian_h5py_serial._objects.with_phil.wrapper
  File "h5py/_debian_h5py_serial/h5f.pyx", line 88, in h5py._debian_h5py_serial.h5f.open
OSError: Unable to open file (truncated file: eof = 96, sblock->base_addr = 0, stored_eof = 2048)
JohannesBuchner commented 2 years ago

Double-check that you can import mpi4py in your python/sherpa script.

https://johannesbuchner.github.io/UltraNest/debugging.html#Parallelisation-issues

JohannesBuchner commented 2 years ago

and delete bxatest/results/points.hdf5

JohannesBuchner commented 6 months ago

I came across this error today after installing mpi4py from pypi/pip, and it went away by installing from apt (python3-mpi4py). Please reopen if you still have the issue.