adaptive-cfd / WABBIT

Wavelet Adaptive Block-Based solver for Interactions with Turbulence
https://www.cfd.tu-berlin.de/
GNU General Public License v3.0
56 stars 27 forks source link

HDF5 crashes unexpectedly on cluster12 tnt.tu-berlin #17

Closed Philipp137 closed 6 years ago

Philipp137 commented 6 years ago

I had this issue now for several times. It always happens unexpectedly after a while. It seams like HDF5 is crashing. The Errormessage:


IO: Saving data triggered, time= 0.26300000E-02
IO: writing data for time =      0.00263000 file = rho_000000002630.h5 active blocks= 4096
IO: writing data for time =      0.00263000 file = Ux_000000002630.h5 active blocks= 4096
IO: writing data for time =      0.00263000 file = Uy_000000002630.h5 active blocks= 4096
IO: writing data for time =      0.00263000 file = p_000000002630.h5 active blocks= 4096
IO: writing data for time =      0.00263000 file = vort_000000002630.h5 active blocks= 4096
IO: writing data for time =      0.00263000 file = mask_000000002630.h5 active blocks= 4096
HDF5-DIAG: Error detected in HDF5 (1.8.17) MPI-process 37:
  #000: H5D.c line 194 in H5Dcreate2(): unable to create dataset
    major: Dataset
    minor: Unable to initialize object
  #001: H5Dint.c line 455 in H5D__create_named(): unable to create and link to dataset
    major: Dataset
    minor: Unable to initialize object
  #002: H5L.c line 1638 in H5L_link_object(): unable to create new link to object
    major: Links
    minor: Unable to initialize object
  #003: H5L.c line 1882 in H5L_create_real(): can't insert link
    major: Symbol table
    minor: Unable to insert object
  #004: H5Gtraverse.c line 861 in H5G_traverse(): internal path traversal failed
    major: Symbol table
    minor: Object not found
  #005: H5Gtraverse.c line 641 in H5G_traverse_real(): traversal operator failed
    major: Symbol table
    minor: Callback failed
  #006: H5L.c line 1685 in H5L_link_cb(): unable to create object
    major: Object header
    minor: Unable to initialize object
  #007: H5O.c line 3016 in H5O_obj_create(): unable to open object
    major: Object header
    minor: Can't open object
  #008: H5Doh.c line 293 in H5O__dset_create(): unable to create dataset
    major: Dataset
    minor: Unable to initialize object
  #009: H5Dint.c line 1140 in H5D__create(): can't update the metadata cache
    major: Dataset
    minor: Unable to initialize object
  #010: H5Dint.c line 854 in H5D__update_oh_info(): unable to update layout/pline/efl header message
    major: Dataset
    minor: Unable to initialize object
  #011: H5Dlayout.c line 238 in H5D__layout_oh_create(): unable to initialize storage
    major: Dataset
    minor: Unable to initialize object
  #012: H5Dint.c line 1822 in H5D__alloc_storage(): unable to initialize dataset with fill value
    major: Dataset
    minor: Unable to initialize object
  #013: H5Dint.c line 1914 in H5D__init_storage(): unable to allocate all chunks of dataset
    major: Dataset
    minor: Unable to initialize object
  #014: H5Dchunk.c line 3575 in H5D__chunk_allocate(): unable to write raw data to file
    major: Low-level I/O
    minor: Write failed
  #015: H5Dchunk.c line 3745 in H5D__chunk_collective_fill(): unable to write raw data to file
    major: Low-level I/O
    minor: Write failed
  #016: H5Fio.c line 171 in H5F_block_write(): write through metadata accumulator failed
    major: Low-level I/O
    minor: Write failed
  #017: H5Faccum.c line 825 in H5F__accum_write(): file write failed
    major: Low-level I/O
    minor: Write failed
  #018: H5FDint.c line 260 in H5FD_write(): driver write request failed
    major: Virtual File Layer
    minor: Write failed
  #019: H5FDmpio.c line 1846 in H5FD_mpio_write(): MPI_File_write_at_all failed
    major: Internal error (too specific to document in detail)
    minor: Some MPI function failed
  #020: H5FDmpio.c line 1846 in H5FD_mpio_write(): Other I/O error , error stack:
(unknown)(): Other I/O error 
    major: Internal error (too specific to document in detail)
    minor: MPI Error String
HDF5-DIAG: Error detected in HDF5 (1.8.17) MPI-process 37:
  #000: H5D.c line 460 in H5Dget_space(): not a dataset
    major: Invalid arguments to routine
    minor: Inappropriate type
HDF5-DIAG: Error detected in HDF5 (1.8.17) MPI-process 37:
  #000: H5S.c line 791 in H5Sget_simple_extent_ndims(): not a dataspace
    major: Invalid arguments to routine
    minor: Inappropriate type
HDF5-DIAG: Error detected in HDF5 (1.8.17) MPI-process 37:
  #000: H5Dio.c line 228 in H5Dwrite(): not a dataset
    major: Invalid arguments to routine
    minor: Inappropriate type
HDF5-DIAG: Error detected in HDF5 (1.8.17) MPI-process 37:
  #000: H5S.c line 392 in H5Sclose(): not a dataspace
    major: Invalid arguments to routine
    minor: Inappropriate type
HDF5-DIAG: Error detected in HDF5 (1.8.17) MPI-process 37:
  #000: H5D.c line 415 in H5Dclose(): not a dataset
    major: Invalid arguments to routine
    minor: Inappropriate type
HDF5-DIAG: Error detected in HDF5 (1.8.17) MPI-process 37:
  #000: H5D.c line 358 in H5Dopen2(): not found
    major: Dataset
    minor: Object not found
  #001: H5Gloc.c line 430 in H5G_loc_find(): can't find object
    major: Symbol table
    minor: Object not found
  #002: H5Gtraverse.c line 861 in H5G_traverse(): internal path traversal failed
    major: Symbol table
    minor: Object not found
  #003: H5Gtraverse.c line 641 in H5G_traverse_real(): traversal operator failed
    major: Symbol table
    minor: Callback failed
  #004: H5Gloc.c line 385 in H5G_loc_find_cb(): object 'blocks' doesn't exist
    major: Symbol table
    minor: Object not found
HDF5-DIAG: Error detected in HDF5 (1.8.17) MPI-process 37:
  #000: H5A.c line 1640 in H5Aexists(): not a location
    major: Invalid arguments to routine
    minor: Inappropriate type
  #001: H5Gloc.c line 253 in H5G_loc(): invalid object ID
    major: Invalid arguments to routine
    minor: Bad value
HDF5-DIAG: Error detected in HDF5 (1.8.17) MPI-process 37:
  #000: H5A.c line 247 in H5Acreate2(): not a location
    major: Invalid arguments to routine
    minor: Inappropriate type
  #001: H5Gloc.c line 253 in H5G_loc(): invalid object ID
    major: Invalid arguments to routine
    minor: Bad value
HDF5-DIAG: Error detected in HDF5 (1.8.17) MPI-process 37:
  #000: H5A.c line 591 in H5Awrite(): not an attribute
    major: Invalid arguments to routine
    minor: Inappropriate type
HDF5-DIAG: Error detected in HDF5 (1.8.17) MPI-process 37:
  #000: H5A.c line 1602 in H5Aclose(): not an attribute
    major: Invalid arguments to routine
    minor: Inappropriate type
HDF5-DIAG: Error detected in HDF5 (1.8.17) MPI-process 37:
  #000: H5D.c line 415 in H5Dclose(): not a dataset
    major: Invalid arguments to routine
    minor: Inappropriate type
tommy-engels commented 6 years ago

Did you try using a newer version of HDF5 on this machine? The current release is 1.10.3. From the number of active blocks I am guessing that none of them has no blocks?

Philipp137 commented 6 years ago

I will try that one...

Philipp137 commented 6 years ago

I have updated to 1.10.3 and since then nothing bad happened...let us pray for the hdf5-god