Unidata / netcdf-c

Official GitHub repository for netCDF-C libraries and utilities.
BSD 3-Clause "New" or "Revised" License
509 stars 262 forks source link

NETCDF4/HDF5 hanging issue #1831

Closed dqwu closed 2 years ago

dqwu commented 4 years ago

The following simple NETCDF4/HDF5 test program can reproduce a hanging issue on some supercomputers.

Cori@NERSC /opt/cray/pe/netcdf-hdf5parallel/4.6.3.2/INTEL/19.0 /opt/cray/pe/hdf5-parallel/1.10.5.2/INTEL/19.0

Theta@ALCF /opt/cray/pe/netcdf-hdf5parallel/4.7.3.3/INTEL/19.1 /opt/cray/pe/hdf5-parallel/1.10.6.1/INTEL/19.1

This test case is run with 16 MPI tasks, and only rank 0 to rank 14 have some data to write (start/count is set to 0 for rank 15). It hangs forever when writing varid 76.

Write varid 0 start
Write varid 0 end
...
Write varid 75 start
Write varid 75 end
Write varid 76 start

It can also be reproduced on my laptop with NetCDF 4.7.1 built with HDF5 1.10.5.

This issue might be related to HDF5 version, since it is not reproducible on ANL workstation compute001 with NetCDF 4.7.3 built with HDF5 1.8.16.

#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
#include <netcdf.h>
#include <netcdf_par.h>

#define NDIM 3
#define X_DIM_LEN 48000
#define Y_DIM_LEN 72
#define NVARS 80

int main(int argc, char* argv[])
{
  int my_rank;
  int ntasks;
  int dimid[NDIM];
  int elements_per_pe;
  int ncid;
  int varid;
  float *buffer;
  char varname[128];
  size_t start[NDIM];
  size_t count[NDIM];

  MPI_Init(&argc, &argv);

  MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
  MPI_Comm_size(MPI_COMM_WORLD, &ntasks);

  if (ntasks != 16) {
    if (my_rank == 0)
      printf("Error: this program is intended to run on 16 processes only\n");
    MPI_Finalize();
    return 1;
  }

  nc_create_par("test.nc", NC_CLOBBER | NC_MPIIO | NC_NETCDF4, MPI_COMM_WORLD, MPI_INFO_NULL, &ncid);
  nc_def_dim(ncid, "time", NC_UNLIMITED, &dimid[0]);
  nc_def_dim(ncid, "x", X_DIM_LEN, &dimid[1]);
  nc_def_dim(ncid, "y", Y_DIM_LEN, &dimid[2]);

  for (int i = 0; i < NVARS; i++) {
    sprintf(varname, "var_%d", i);
    nc_def_var(ncid, varname, NC_FLOAT, NDIM, dimid, &varid);
    nc_var_par_access(ncid, varid, NC_COLLECTIVE);
  }

  nc_enddef(ncid);

  elements_per_pe = X_DIM_LEN * Y_DIM_LEN / 15;

  if (my_rank < 15) {
    start[0] = 0;
    count[0] = 1;

    start[1] = my_rank * (X_DIM_LEN / 15);
    count[1] = X_DIM_LEN / 15;

    start[2] = 0;
    count[2] = 72;

    buffer = malloc(elements_per_pe * sizeof(float));
    for (int i = 0; i < elements_per_pe; i++)
      buffer[i] = my_rank;
  }
  else {
    start[0] = 0;
    count[0] = 0;

    start[1] = 0;
    count[1] = 0;

    start[2] = 0;
    count[2] = 0;

    buffer = NULL;
  }

  for (varid = 0; varid < NVARS; varid++) {
    if (my_rank == 0) { 
      printf("Write varid %d start\n", varid);
      fflush(stdout);
    }

    nc_var_par_access(ncid, varid, NC_COLLECTIVE);
    nc_put_vara_float(ncid, varid, start, count, buffer);

    if (my_rank == 0) {
      printf("Write varid %d end\n", varid);
      fflush(stdout);
    }
  }

  nc_close(ncid);

  free(buffer);

  MPI_Finalize();

  return 0;
}
gsjaardema commented 4 years ago

We have had some issues with intel and hdf5-1.10.x hanging.

One option that has solved a couple of our hangs is to use mpirun --mca io ompio (or mpiexec). This has fixed some issues we were seeing with openmpi-4.0.3 with both gcc and intel.

edwardhartnett commented 4 years ago

OK, first we need a PR that makes this a test in netcdf-c.

Then we need a HDF5-only test which demonstrates the issue.

Then we can go to the HDF5 team with a bug report.

edwardhartnett commented 2 years ago

OK, I have made a test for this, tst_parallel7.c. It passes for me on my machine.

Let's put the test into the build to make sure is passes for everyone else. If it fails somewhere, then we have learned something. If it passes everywhere, then maybe this bug has been fixed...

(But let's wait until after the next netcdf-c release to merge this test.)

edwardhartnett commented 2 years ago

Something that you are doing in the code @dqwu which I did not expect users to do...

How come you don't change the variable mode to NC_COLLECTIVE when defining the metadata? I would have expected to see the nc_var_par_access() up before the nc_enddef(). By doing it this way, the mode is independent on most variables but collective on others.

dqwu commented 2 years ago

Something that you are doing in the code @dqwu which I did not expect users to do...

How come you don't change the variable mode to NC_COLLECTIVE when defining the metadata? I would have expected to see the nc_var_par_access() up before the nc_enddef(). By doing it this way, the mode is independent on most variables but collective on others.

You are right, I forgot to call nc_var_par_access() before the nc_enddef() in my example code. However, this hanging issue is still reproducible with that minor change.

edwardhartnett commented 2 years ago

OK, does the issue occur with the current master and HDF5-1.12.2? Or just with historical versions of HDF5 and/or netcdf-c?

dqwu commented 2 years ago

OK, does the issue occur with the current master and HDF5-1.12.2? Or just with historical versions of HDF5 and/or netcdf-c?

I can still reproduce this issue on my laptop with NetCDF 4.7.1 built with HDF5 1.10.5. I will try HDF5 1.12 later.

dqwu commented 2 years ago

@edwardhartnett This issue is reproducible with NetCDF 4.7.1 built with HDF5 1.10.6 but not reproducible with NetCDF 4.7.1 built with HDF5 1.10.7. So it is likely a bug of HDF5 1.10.6 or older, which has been fixed in HDF5 1.10.7 or newer.

edwardhartnett commented 2 years ago

OK, then let's close this issue. There's not going to be anything we can do to make this work in old releases of HDF5. ;-)

dqwu commented 2 years ago

OK, then let's close this issue. There's not going to be anything we can do to make this work in old releases of HDF5. ;-)

FYI, not sure if it is related to this bug fix in HDF5-1.10.7 release:

Bug Fixes since HDF5-1.10.6 release
==================================

    Library
    -------
    - Fix bug and simplify collective metadata write operation when some ranks
        have no entries to contribute.  This fixes parallel regression test
        failures with IBM SpectrumScale MPI on the Summit system at ORNL.

      (QAK - 2020/09/02)
edwardhartnett commented 2 years ago

@dqwu could you please close this issue?

dqwu commented 2 years ago

Close this issue since it has been fixed by HDF5 1.10.7.