dbbs-lab / bsb-core

The Brain Scaffold Builder
https://bsb.readthedocs.io
GNU General Public License v3.0
21 stars 16 forks source link

H5py issue during simulation #836

Open drodarie opened 4 months ago

drodarie commented 4 months ago

When running a simulation with Nest, using MPI, one of the cores fails to get access to the h5 file. Command used:

mpirun -n 6 bsb -v=4 simulate cerebellum.hdf5 basal_activity

Stack trace:

Traceback (most recent call last):
  File "/home/toromis/Workspace/venv/bin/bsb", line 8, in <module>
    sys.exit(handle_cli())
  File "/home/toromis/Workspace/dbbs/bsb/bsb-core/bsb/cli/__init__.py", line 11, in handle_cli
    handle_command(sys.argv[1:], exit=True)
  File "/home/toromis/Workspace/dbbs/bsb/bsb-core/bsb/cli/__init__.py", line 31, in handle_command
    namespace.handler(namespace, dryrun=dryrun)
  File "/home/toromis/Workspace/dbbs/bsb/bsb-core/bsb/cli/commands/__init__.py", line 99, in execute_handler
    self.handler(context)
  File "/home/toromis/Workspace/dbbs/bsb/bsb-core/bsb/cli/commands/_commands.py", line 208, in handler
    result = network.run_simulation(sim_name)
  File "/home/toromis/Workspace/dbbs/bsb/bsb-core/bsb/profiling.py", line 159, in decorated
    return f(*args, **kwargs)
  File "/home/toromis/Workspace/dbbs/bsb/bsb-core/bsb/core.py", line 443, in run_simulation
    return adapter.simulate(simulation)[0]
  File "/home/toromis/Workspace/dbbs/bsb/bsb-nest/bsb_nest/adapter.py", line 53, in simulate
    return super().simulate(simulation)
  File "/home/toromis/Workspace/dbbs/bsb/bsb-core/bsb/simulation/adapter.py", line 76, in simulate
    data = self.prepare(simulation)
  File "/home/toromis/Workspace/dbbs/bsb/bsb-nest/bsb_nest/adapter.py", line 58, in prepare
    self.simdata[simulation] = SimulationData(
  File "/home/toromis/Workspace/dbbs/bsb/bsb-core/bsb/simulation/adapter.py", line 52, in __init__
    self.placement: dict["CellModel", "PlacementSet"] = {
  File "/home/toromis/Workspace/dbbs/bsb/bsb-core/bsb/simulation/adapter.py", line 53, in <dictcomp>
    model: model.get_placement_set() for model in simulation.cell_models.values()
  File "/home/toromis/Workspace/dbbs/bsb/bsb-core/bsb/simulation/cell.py", line 35, in get_placement_set
    return self.cell_type.get_placement_set(chunks=chunks)
  File "/home/toromis/Workspace/dbbs/bsb/bsb-core/bsb/cell_types.py", line 103, in get_placement_set
    return self.scaffold.get_placement_set(self, *args, **kwargs)
  File "/home/toromis/Workspace/dbbs/bsb/bsb-core/bsb/core.py", line 553, in get_placement_set
    return self.storage.get_placement_set(
  File "/home/toromis/Workspace/dbbs/bsb/bsb-core/bsb/storage/__init__.py", line 286, in get_placement_set
    ps = self._PlacementSet(self._engine, type)
  File "/home/toromis/Workspace/dbbs/bsb/bsb-hdf5/bsb_hdf5/placement_set.py", line 87, in __init__
    if not self.exists(engine, cell_type):
  File "/home/toromis/Workspace/dbbs/bsb/bsb-hdf5/bsb_hdf5/placement_set.py", line 108, in exists
    with engine._handle("r") as h:
  File "/home/toromis/Workspace/dbbs/bsb/bsb-hdf5/bsb_hdf5/__init__.py", line 141, in _handle
    return h5py.File(self._root, mode)
  File "/home/toromis/Workspace/venv/lib/python3.10/site-packages/h5py/_hl/files.py", line 562, in __init__
    fid = make_fid(name, mode, userblock_size, fapl, fcpl, swmr=swmr)
  File "/home/toromis/Workspace/venv/lib/python3.10/site-packages/h5py/_hl/files.py", line 235, in make_fid
    fid = h5f.open(name, flags, fapl=fapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5f.pyx", line 102, in h5py.h5f.open
BlockingIOError: [Errno 11] Unable to synchronously open file (unable to lock file, errno = 11, error message = 'Resource temporarily unavailable')

At the moment, I do not have a simple way to replicate this issue, since it happens randomly.

Helveg commented 4 months ago

Please confirm, but during simulation the storage engine should be operating in "readonly" mode, so it would be safe to use HDF5_USE_FILE_LOCKING=FALSE as a workaround.

This may however indicate a problem with your MPI installation. Since an MPI-Window based lock should be active to prevent these issues.