Unidata / netcdf-c

Official GitHub repository for netCDF-C libraries and utilities.
BSD 3-Clause "New" or "Revised" License
511 stars 263 forks source link

Very strange runtime behaviour #1374

Open xy124 opened 5 years ago

xy124 commented 5 years ago

Environment Information

Summary of Issue

When using parallel netCDF I get good runtimes on 1,3,4,5,6,7,8 nodes (16 cores each) but when using 2 nodes the time for file writes is about 10 times higher. This is reproducible. Writing wall time on one node: for the nc_put_vara_double in main.c:229 : ca 4,5 seconds Writing wall time on two nodes: for the nc_put_vara_double in main.c:229 : ca 45 seconds Writing wall time on three nodes: for the nc_put_vara_double in main.c:229 : ca 5 seconds

Steps to reproduce the behavior

very short version:

nc_create_par(file_name, NC_NETCDF4 | NC_MPIIO, MPI_CommWorld, MPI_INFO_NULL, &ncID);
nc_var_par_access(current_file_id, time_var_id, NC_COLLECTIVE);

nc_put_vara_double(current_file_id, time_var_id, start, count, &(m->time));

I even tried with multiple sizes (data about 4 MB up to 40MB) with the same results.

The source code is found here: https://github.com/xy124/parflow/blob/parFlowVR/flowvr/netcdf-writer/main.c . One of those is executed per compute Node! (And it gets data to write from other cores A very small explication of the linked source code: We wait for messages (fca_wait()) that contains data to write into netcdf files. These messages are then read and written (nc_put_vara_double(current_file_id, variable_var_id, start, count, data);) That works very fine but on 2 nodes as mentioned this takes very long.

I'm very thankful for every idea what I could try. Is there a good realtime profiler that works with netcdf? I tried google perf tools with no success.

edhartnett commented 5 years ago

My first idea would be to replicate this case in pure HDF5 to see if the problem is in hdf5 or netcdf.

xy124 commented 5 years ago

@edhartnett Can you recommend a testcase for this?

xy124 commented 5 years ago

In case it helps: I'm using netcdf 4.6.3 compiled with hdf5

edhartnett commented 5 years ago

What I am recommending is that you write a test case.

Go to the h5_test directory and you will see a bunch of HDF5-only programs (i.e. no netcdf code). If you can write a HDF5 program that duplicates what netCDF-4 is doing we can see if the problem is in HDF5 or netcdf4.

If you write a HDF5 program to do your data read, and time it, and run it in varying number of tasks, we can see if it runs slower in HDF5-only code with 2 processors. Then we can take that to the HDF5 team.

Alternatively, if the HDF5-only code does not run slower on 2 processors, then we know that something is happening in the netCDF-4 code.