Open xy124 opened 5 years ago
My first idea would be to replicate this case in pure HDF5 to see if the problem is in hdf5 or netcdf.
@edhartnett Can you recommend a testcase for this?
In case it helps: I'm using netcdf 4.6.3 compiled with hdf5
What I am recommending is that you write a test case.
Go to the h5_test directory and you will see a bunch of HDF5-only programs (i.e. no netcdf code). If you can write a HDF5 program that duplicates what netCDF-4 is doing we can see if the problem is in HDF5 or netcdf4.
If you write a HDF5 program to do your data read, and time it, and run it in varying number of tasks, we can see if it runs slower in HDF5-only code with 2 processors. Then we can take that to the HDF5 team.
Alternatively, if the HDF5-only code does not run slower on 2 processors, then we know that something is happening in the netCDF-4 code.
Environment Information
configure
)C
code to recreate the issue?Summary of Issue
When using parallel netCDF I get good runtimes on 1,3,4,5,6,7,8 nodes (16 cores each) but when using 2 nodes the time for file writes is about 10 times higher. This is reproducible. Writing wall time on one node: for the nc_put_vara_double in main.c:229 : ca 4,5 seconds Writing wall time on two nodes: for the nc_put_vara_double in main.c:229 : ca 45 seconds Writing wall time on three nodes: for the nc_put_vara_double in main.c:229 : ca 5 seconds
Steps to reproduce the behavior
very short version:
I even tried with multiple sizes (
data
about 4 MB up to 40MB) with the same results.The source code is found here: https://github.com/xy124/parflow/blob/parFlowVR/flowvr/netcdf-writer/main.c . One of those is executed per compute Node! (And it gets data to write from other cores A very small explication of the linked source code: We wait for messages (fca_wait()) that contains data to write into netcdf files. These messages are then read and written (
nc_put_vara_double(current_file_id, variable_var_id, start, count, data);
) That works very fine but on 2 nodes as mentioned this takes very long.I'm very thankful for every idea what I could try. Is there a good realtime profiler that works with netcdf? I tried google perf tools with no success.