braceal / molecules

Machine learning for molecular dynamics.
MIT License
5 stars 5 forks source link

HDF5 intermittent failure #81

Open braceal opened 3 years ago

braceal commented 3 years ago

In dataset class and a try and except block to retry opening the h5 file. Should retry a parameterized number of times and wait for 10 seconds each time.

Traceback (most recent call last):
  File "/p/gpfs1/brace3/src/DeepDriveMD-pipeline/deepdrivemd/models/aae/train.py", line 390, in <module>
    main(cfg, args.encoder_gpu, args.generator_gpu, args.decoder_gpu, args.distributed)
  File "/p/gpfs1/brace3/src/DeepDriveMD-pipeline/deepdrivemd/models/aae/train.py", line 259, in main
    cms_transform=False,
  File "/p/gpfs1/brace3/src/DeepDriveMD-pipeline/deepdrivemd/models/aae/train.py", line 118, in get_dataset
    cms_transform=cms_transform,
  File "/p/gpfs1/brace3/src/molecules/molecules/ml/datasets/point_cloud.py", line 62, in __init__
    with open_h5(self.file_path, 'r', libver = 'latest', swmr = False) as f:
  File "/p/gpfs1/brace3/src/molecules/molecules/utils/read_file.py", line 20, in open_h5
    return h5py.File(h5_file, mode, **kwargs)
  File "/g/g15/brace3/.conda/envs/conda-pytorch/lib/python3.7/site-packages/h5py/_hl/files.py", line 408, in __init__
    swmr=swmr)
  File "/g/g15/brace3/.conda/envs/conda-pytorch/lib/python3.7/site-packages/h5py/_hl/files.py", line 173, in make_fid
    fid = h5f.open(name, flags, fapl=fapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5f.pyx", line 88, in h5py.h5f.open
OSError: Unable to open file (truncated file: eof = 48, sblock->base_addr = 0, stored_eof = 2048)