A PyTorch-based library for working with 3D and 2D convolutional neural networks, with focus on semantic segmentation of volumetric biomedical image data
[...]
self.id.read(mspace, fspace, arr, mtype, dxpl=self._dxpl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5d.pyx", line 181, in h5py.h5d.DatasetID.read
File "h5py/_proxy.pyx", line 130, in h5py._proxy.dset_rw
File "h5py/_proxy.pyx", line 84, in h5py._proxy.H5PY_H5Dread
OSError: Can't read data (wrong B-tree signature)
Attempting to read from the same source coordinates again usually works, which is why it's wrapped in a retry-block and doesn't affect training. It's still very annoying to have this issue.
Since the errors are not deterministic, I guess they are either caused
by a concurrency issue in PyTorch's DataLoader, in HDF5/h5py or maybe it's
even a filesystem issue.
(One of the error messages can be found in the commit message of e1a55ed.)
Once in a while, data loaders (especially validation data loader) encounter a random read error when slicing from HDF5 files at https://github.com/ELEKTRONN/elektronn3/blob/e4dff1b9b9c44794a6ecc2c3fcf440f047451367/elektronn3/data/utils.py#L44 The end of the traceback looks like this:
Attempting to read from the same source coordinates again usually works, which is why it's wrapped in a retry-block and doesn't affect training. It's still very annoying to have this issue.
Quoting from https://github.com/ELEKTRONN/elektronn3/commit/0ed440886e426774def66f0eacf6a7f6225ca883: