ITA-Solar / rh

RH 1.5D
20 stars 17 forks source link

HDF5 error on vilje #21

Closed tiagopereira closed 5 years ago

tiagopereira commented 5 years ago

Crashes on vilje with HDF error on init_hdf5_indata_new(). Does not appear to be related to lack of disk space. Running rh15d_lteray works (only produces output_ray.hdf5). Error message:

HDF5-DIAG: Error detected in HDF5 (1.8.19) MPI-process 1:
  #000: H5F.c line 730 in H5Fflush(): unable to flush file's cached information
    major: File accessibilty
    minor: Unable to flush data from cache
  #001: H5Fint.c line 1183 in H5F_flush(): low level truncate failed
    major: File accessibilty
    minor: Write failed
Process    1: (EEE) init_hdf5_indata_new: HDF5 error.
  #002: H5FD.c line 1907 in H5FD_truncate(): driver truncate request failed
    major: Virtual File Layer
    minor: Can't update object
  #003: H5FDmpio.c line 1983 in H5FD_mpio_truncate(): MPI_File_set_size failed
    major: Internal error (too specific to document in detail)
    minor: Some MPI function failed
  #004: H5FDmpio.c line 1983 in H5FD_mpio_truncate(): Invalid argument
    major: Internal error (too specific to document in detail)
    minor: MPI Error String
HDF5-DIAG: Error detected in HDF5 (1.8.19) MPI-process 0:
  #000: H5F.c line 730 in H5Fflush(): unable to flush file's cached information
    major: File accessibilty
    minor: Unable to flush data from cache
  #001: H5Fint.c line 1183 in H5F_flush(): low level truncate failed
    major: File accessibilty
    minor: Write failed
  #002: H5FD.c line 1907 in H5FD_truncate(): driver truncate request failed
    major: Virtual File Layer
    minor: Can't update object
  #003: H5FDmpio.c line 1983 in H5FD_mpio_truncate(): MPI_File_set_size failed
    major: Internal error (too specific to document in detail)
    minor: Some MPI function failed
Process    0: (EEE) init_hdf5_indata_new: HDF5 error.
  #004: H5FDmpio.c line 1983 in H5FD_mpio_truncate(): Invalid argument
    major: Internal error (too specific to document in detail)
    minor: MPI Error String
MPT ERROR: MPI_COMM_WORLD rank 1 has terminated without calling MPI_Finalize()
    aborting job

HDF5 version is 1.8.19 (Intel compilers). Running on the same version, same input files, on the ITA linux machines has no problems.

tiagopereira commented 5 years ago

Commenting the H5Fflush call in writeindata.c makes the issue go away momentarily (but often comes back at the file close state, with no major issue).

Seems to be related to buggy HDF5 on vilje. Compiling HDF5 1.10.5 from scratch solves the problem.