NetCDF4 can perform parallel IO using parallel HDF5. When using Darshan to capture a NetCDF4 application's I/O behavior, I observe that the actual file close is delayed from nc_close call to MPI_Finalize. The incorrect timing of file close will affect the correctness of Log VOL who needs to use/release HDF5 resources at file close time, some of which are not available at MPI_Finalize (e.g. H5T_STD_B8LE).
Reproduce
Test program
test.c is a simple NetCDF4 programs that open a NetCDF4 file and close directly. It also prints a string application: nc_close start and application: nc_close end before and after nc_close.
Click here to see test.c
```c
#include
#include
#include
#include
#include
#define FATAL_ERR {if(err!=NC_NOERR) {printf("Error at line=%d: %s Aborting ...\n", __LINE__, nc_strerror(err)); goto fn_exit;}}
#define ERR {if(err!=NC_NOERR)printf("Error at line=%d: %s\n", __LINE__, nc_strerror(err));}
int main(int argc, char** argv)
{
const char* filename="testfile";
int err;
int ncid, cmode;
MPI_Init(&argc, &argv);
/* create a new file for writing ----------------------------------------*/
cmode = NC_NETCDF4 | NC_CLOBBER | NC_MPIIO;
err = nc_create_par(filename, cmode, MPI_COMM_WORLD, MPI_INFO_NULL, &ncid); FATAL_ERR
/* exit define mode */
err = nc_enddef(ncid); ERR
/* close the file */
printf("========= application: nc_close start\n");
err = nc_close(ncid); ERR
printf("========= application: nc_close end\n");
fn_exit:
MPI_Finalize();
return 0;
}
```
Compile and Run
Makefile is provided below. make to compile the program. make withdarshan and make nodarshan will run the program with/without darshan. Note that the Passthrough VOL is enabled so that a message can be printed when the actual file close happens. Passthrough VOL comes together with HDF5 installation, but we need to add CFLAGS="-DENABLE_PASSTHRU_LOGGING" when installing HDF5 in order to enable printing. The programs runs with 1 MPI process.
The outputs for both darshan and no-darshan are below. They are expected to be the same but if Darshan is not enabled, we can see that PASS THROUGH VOL FILE Close occurs between application: nc_close start/end. And if Darshan is enabled, PASS THROUGH VOL FILE Close occurs after application: nc_close end.
Click here to see the no-darshan (expected) output
```txt
HDF5_PLUGIN_PATH=/lib \
LD_LIBRARY_PATH=/files2/scratch/zhd1108/NetCDF/install/lib:/files2/scratch/zhd1108/HDF5/1.14.0/lib \
HDF5_VOL_CONNECTOR="pass_through under_vol=0;under_info={}" \
mpirun -n 1 ./test
------- PASS THROUGH VOL INIT
------- PASS THROUGH VOL INFO String To Info
------- PASS THROUGH VOL INFO Copy
------- PASS THROUGH VOL INFO Copy
------- PASS THROUGH VOL FILE Create
------- PASS THROUGH VOL INFO Copy
------- PASS THROUGH VOL INFO Copy
------- PASS THROUGH VOL INFO Free
------- PASS THROUGH VOL INFO Copy
------- PASS THROUGH VOL INFO Free
------- PASS THROUGH VOL WRAP CTX Get
------- PASS THROUGH VOL INTROSPECT OptQuery
------- PASS THROUGH VOL WRAP CTX Free
------- PASS THROUGH VOL WRAP CTX Get
------- PASS THROUGH VOL File Optional
------- PASS THROUGH VOL WRAP Object
------- PASS THROUGH VOL WRAP CTX Free
------- PASS THROUGH VOL WRAP CTX Get
------- PASS THROUGH VOL GROUP Open
------- PASS THROUGH VOL WRAP CTX Free
------- PASS THROUGH VOL INFO Free
------- PASS THROUGH VOL WRAP CTX Get
------- PASS THROUGH VOL FILE Get
------- PASS THROUGH VOL WRAP CTX Free
------- PASS THROUGH VOL WRAP CTX Get
------- PASS THROUGH VOL ATTRIBUTE Specific
------- PASS THROUGH VOL WRAP CTX Free
------- PASS THROUGH VOL WRAP CTX Get
------- PASS THROUGH VOL ATTRIBUTE Create
------- PASS THROUGH VOL WRAP CTX Free
------- PASS THROUGH VOL WRAP CTX Get
------- PASS THROUGH VOL ATTRIBUTE Write
------- PASS THROUGH VOL WRAP CTX Free
------- PASS THROUGH VOL ATTRIBUTE Close
------- PASS THROUGH VOL WRAP CTX Get
------- PASS THROUGH VOL FILE Specific
------- PASS THROUGH VOL WRAP CTX Free
========= application: nc_close start
------- PASS THROUGH VOL WRAP CTX Get
------- PASS THROUGH VOL ATTRIBUTE Specific
------- PASS THROUGH VOL WRAP CTX Free
------- PASS THROUGH VOL WRAP CTX Get
------- PASS THROUGH VOL FILE Specific
------- PASS THROUGH VOL WRAP CTX Free
------- PASS THROUGH VOL WRAP CTX Get
------- PASS THROUGH VOL H5Gclose
------- PASS THROUGH VOL WRAP CTX Free
------- PASS THROUGH VOL WRAP CTX Get
------- PASS THROUGH VOL FILE Close
------- PASS THROUGH VOL INFO Free
------- PASS THROUGH VOL UNWRAP Object
------- PASS THROUGH VOL WRAP CTX Free
========= application: nc_close end
------- PASS THROUGH VOL INFO Free
------- PASS THROUGH VOL INFO Free
------- PASS THROUGH VOL TERM
```
Click here to see output if darshan is enabled
```txt
HDF5_PLUGIN_PATH=/lib \
LD_LIBRARY_PATH=/files2/scratch/zhd1108/NetCDF/install/lib:/files2/scratch/zhd1108/HDF5/1.14.0/lib \
HDF5_VOL_CONNECTOR="pass_through under_vol=0;under_info={}" \
mpirun -n 1 -env LD_PRELOAD="/files2/scratch/zhd1108/Darshan/3.4.2/lib/libdarshan.so" ./test
------- PASS THROUGH VOL INIT
------- PASS THROUGH VOL INFO String To Info
------- PASS THROUGH VOL INFO Copy
------- PASS THROUGH VOL INFO Copy
------- PASS THROUGH VOL FILE Create
------- PASS THROUGH VOL INFO Copy
------- PASS THROUGH VOL INFO Copy
------- PASS THROUGH VOL INFO Free
------- PASS THROUGH VOL INFO Copy
------- PASS THROUGH VOL INFO Free
------- PASS THROUGH VOL WRAP CTX Get
------- PASS THROUGH VOL INTROSPECT OptQuery
------- PASS THROUGH VOL WRAP CTX Free
------- PASS THROUGH VOL WRAP CTX Get
------- PASS THROUGH VOL File Optional
------- PASS THROUGH VOL WRAP Object
------- PASS THROUGH VOL WRAP CTX Free
------- PASS THROUGH VOL WRAP CTX Get
------- PASS THROUGH VOL GROUP Open
------- PASS THROUGH VOL WRAP CTX Free
------- PASS THROUGH VOL INFO Free
------- PASS THROUGH VOL WRAP CTX Get
------- PASS THROUGH VOL FILE Get
------- PASS THROUGH VOL WRAP CTX Free
------- PASS THROUGH VOL WRAP CTX Get
------- PASS THROUGH VOL ATTRIBUTE Specific
------- PASS THROUGH VOL WRAP CTX Free
------- PASS THROUGH VOL WRAP CTX Get
------- PASS THROUGH VOL ATTRIBUTE Create
------- PASS THROUGH VOL WRAP CTX Free
------- PASS THROUGH VOL WRAP CTX Get
------- PASS THROUGH VOL ATTRIBUTE Write
------- PASS THROUGH VOL WRAP CTX Free
------- PASS THROUGH VOL ATTRIBUTE Close
------- PASS THROUGH VOL WRAP CTX Get
------- PASS THROUGH VOL FILE Specific
------- PASS THROUGH VOL WRAP CTX Free
------- PASS THROUGH VOL WRAP CTX Get
------- PASS THROUGH VOL OBJECT Get
------- PASS THROUGH VOL WRAP CTX Free
------- PASS THROUGH VOL Get object
========= application: nc_close start
------- PASS THROUGH VOL WRAP CTX Get
------- PASS THROUGH VOL ATTRIBUTE Specific
------- PASS THROUGH VOL WRAP CTX Free
------- PASS THROUGH VOL WRAP CTX Get
------- PASS THROUGH VOL FILE Specific
------- PASS THROUGH VOL WRAP CTX Free
------- PASS THROUGH VOL WRAP CTX Get
------- PASS THROUGH VOL OBJECT Get
------- PASS THROUGH VOL WRAP CTX Free
------- PASS THROUGH VOL Get object
------- PASS THROUGH VOL WRAP CTX Get
------- PASS THROUGH VOL H5Gclose
------- PASS THROUGH VOL WRAP CTX Free
========= application: nc_close end
------- PASS THROUGH VOL WRAP CTX Get
------- PASS THROUGH VOL FILE Close
------- PASS THROUGH VOL INFO Free
------- PASS THROUGH VOL UNWRAP Object
------- PASS THROUGH VOL WRAP CTX Free
------- PASS THROUGH VOL INFO Free
------- PASS THROUGH VOL INFO Free
------- PASS THROUGH VOL TERM
```
Library Version
HDF5 1.14.0. configured with --enable-parallel, --enable-build-mode=debug, and CFLAGS="-DENABLE_PASSTHRU_LOGGING"
NetCDF 4.9.1. configured with --disable-dap --disable-mmap --disable-nczarr --disable-byterange. (some configure options here are necessary to avoid known compiling issues with HDF5 1.14.0)
Darshan 3.4.2
(I tested that using HDF5 1.13.2 and NetCDF 4.9.0 can also reproduce the problem.)
Other findings
The problem can be reproduced without the use of a Passthrough VOL. We can add a print statement for the info->count in the HDF5 source codes here. It shows the reference count of an object. If Darshan is not enabled, the reference count is 1 at the time nc_close calls H5Fclose. If Darshan is enabled, the reference count is 3 so it thinks someone else is still accessing the file and will delay the actual close to very end. I am not sure whether Darshan holds an extra reference to the file or it is as issue more related to NetCDF4. Using HDF5 directly (no NetCDF4 involved) does not give this issue.
Summary
NetCDF4 can perform parallel IO using parallel HDF5. When using Darshan to capture a NetCDF4 application's I/O behavior, I observe that the actual file close is delayed from
nc_close
call toMPI_Finalize
. The incorrect timing of file close will affect the correctness of Log VOL who needs to use/release HDF5 resources at file close time, some of which are not available atMPI_Finalize
(e.g. H5T_STD_B8LE).Reproduce
Test program
test.c
is a simple NetCDF4 programs that open a NetCDF4 file and close directly. It also prints a stringapplication: nc_close start
andapplication: nc_close end
before and afternc_close
.Click here to see test.c
```c #includeCompile and Run
Makefile
is provided below.make
to compile the program.make withdarshan
andmake nodarshan
will run the program with/without darshan. Note that the Passthrough VOL is enabled so that a message can be printed when the actual file close happens. Passthrough VOL comes together with HDF5 installation, but we need to addCFLAGS="-DENABLE_PASSTHRU_LOGGING"
when installing HDF5 in order to enable printing. The programs runs with 1 MPI process.Click here to see Makefile
```makefile DARSHAN_DIR=${LOCAL_HOME}/Darshan/3.4.2/lib/libdarshan.so HDF5_DIR=${LOCAL_HOME}/HDF5/1.14.0 NETCDF_DIR=${LOCAL_HOME}/NetCDF/install all: mpicc test.c -g -o test \ -I${NETCDF_DIR}/include \ -L${NETCDF_DIR}/lib -lnetcdf withdarshan: HDF5_PLUGIN_PATH=${HDF5}/lib \ LD_LIBRARY_PATH=${NETCDF_DIR}/lib:${HDF5_DIR}/lib \ HDF5_VOL_CONNECTOR="pass_through under_vol=0;under_info={}" \ mpirun -n 1 -env LD_PRELOAD="${DARSHAN_DIR}" ./test nodarshan: HDF5_PLUGIN_PATH=${HDF5}/lib \ LD_LIBRARY_PATH=${NETCDF_DIR}/lib:${HDF5_DIR}/lib \ HDF5_VOL_CONNECTOR="pass_through under_vol=0;under_info={}" \ mpirun -n 1 ./test clean: rm -rf testfile core.* test ```Outputs
The outputs for both darshan and no-darshan are below. They are expected to be the same but if Darshan is not enabled, we can see that
PASS THROUGH VOL FILE Close
occurs betweenapplication: nc_close start/end
. And if Darshan is enabled,PASS THROUGH VOL FILE Close
occurs afterapplication: nc_close end
.Click here to see the no-darshan (expected) output
```txt HDF5_PLUGIN_PATH=/lib \ LD_LIBRARY_PATH=/files2/scratch/zhd1108/NetCDF/install/lib:/files2/scratch/zhd1108/HDF5/1.14.0/lib \ HDF5_VOL_CONNECTOR="pass_through under_vol=0;under_info={}" \ mpirun -n 1 ./test ------- PASS THROUGH VOL INIT ------- PASS THROUGH VOL INFO String To Info ------- PASS THROUGH VOL INFO Copy ------- PASS THROUGH VOL INFO Copy ------- PASS THROUGH VOL FILE Create ------- PASS THROUGH VOL INFO Copy ------- PASS THROUGH VOL INFO Copy ------- PASS THROUGH VOL INFO Free ------- PASS THROUGH VOL INFO Copy ------- PASS THROUGH VOL INFO Free ------- PASS THROUGH VOL WRAP CTX Get ------- PASS THROUGH VOL INTROSPECT OptQuery ------- PASS THROUGH VOL WRAP CTX Free ------- PASS THROUGH VOL WRAP CTX Get ------- PASS THROUGH VOL File Optional ------- PASS THROUGH VOL WRAP Object ------- PASS THROUGH VOL WRAP CTX Free ------- PASS THROUGH VOL WRAP CTX Get ------- PASS THROUGH VOL GROUP Open ------- PASS THROUGH VOL WRAP CTX Free ------- PASS THROUGH VOL INFO Free ------- PASS THROUGH VOL WRAP CTX Get ------- PASS THROUGH VOL FILE Get ------- PASS THROUGH VOL WRAP CTX Free ------- PASS THROUGH VOL WRAP CTX Get ------- PASS THROUGH VOL ATTRIBUTE Specific ------- PASS THROUGH VOL WRAP CTX Free ------- PASS THROUGH VOL WRAP CTX Get ------- PASS THROUGH VOL ATTRIBUTE Create ------- PASS THROUGH VOL WRAP CTX Free ------- PASS THROUGH VOL WRAP CTX Get ------- PASS THROUGH VOL ATTRIBUTE Write ------- PASS THROUGH VOL WRAP CTX Free ------- PASS THROUGH VOL ATTRIBUTE Close ------- PASS THROUGH VOL WRAP CTX Get ------- PASS THROUGH VOL FILE Specific ------- PASS THROUGH VOL WRAP CTX Free ========= application: nc_close start ------- PASS THROUGH VOL WRAP CTX Get ------- PASS THROUGH VOL ATTRIBUTE Specific ------- PASS THROUGH VOL WRAP CTX Free ------- PASS THROUGH VOL WRAP CTX Get ------- PASS THROUGH VOL FILE Specific ------- PASS THROUGH VOL WRAP CTX Free ------- PASS THROUGH VOL WRAP CTX Get ------- PASS THROUGH VOL H5Gclose ------- PASS THROUGH VOL WRAP CTX Free ------- PASS THROUGH VOL WRAP CTX Get ------- PASS THROUGH VOL FILE Close ------- PASS THROUGH VOL INFO Free ------- PASS THROUGH VOL UNWRAP Object ------- PASS THROUGH VOL WRAP CTX Free ========= application: nc_close end ------- PASS THROUGH VOL INFO Free ------- PASS THROUGH VOL INFO Free ------- PASS THROUGH VOL TERM ```Click here to see output if darshan is enabled
```txt HDF5_PLUGIN_PATH=/lib \ LD_LIBRARY_PATH=/files2/scratch/zhd1108/NetCDF/install/lib:/files2/scratch/zhd1108/HDF5/1.14.0/lib \ HDF5_VOL_CONNECTOR="pass_through under_vol=0;under_info={}" \ mpirun -n 1 -env LD_PRELOAD="/files2/scratch/zhd1108/Darshan/3.4.2/lib/libdarshan.so" ./test ------- PASS THROUGH VOL INIT ------- PASS THROUGH VOL INFO String To Info ------- PASS THROUGH VOL INFO Copy ------- PASS THROUGH VOL INFO Copy ------- PASS THROUGH VOL FILE Create ------- PASS THROUGH VOL INFO Copy ------- PASS THROUGH VOL INFO Copy ------- PASS THROUGH VOL INFO Free ------- PASS THROUGH VOL INFO Copy ------- PASS THROUGH VOL INFO Free ------- PASS THROUGH VOL WRAP CTX Get ------- PASS THROUGH VOL INTROSPECT OptQuery ------- PASS THROUGH VOL WRAP CTX Free ------- PASS THROUGH VOL WRAP CTX Get ------- PASS THROUGH VOL File Optional ------- PASS THROUGH VOL WRAP Object ------- PASS THROUGH VOL WRAP CTX Free ------- PASS THROUGH VOL WRAP CTX Get ------- PASS THROUGH VOL GROUP Open ------- PASS THROUGH VOL WRAP CTX Free ------- PASS THROUGH VOL INFO Free ------- PASS THROUGH VOL WRAP CTX Get ------- PASS THROUGH VOL FILE Get ------- PASS THROUGH VOL WRAP CTX Free ------- PASS THROUGH VOL WRAP CTX Get ------- PASS THROUGH VOL ATTRIBUTE Specific ------- PASS THROUGH VOL WRAP CTX Free ------- PASS THROUGH VOL WRAP CTX Get ------- PASS THROUGH VOL ATTRIBUTE Create ------- PASS THROUGH VOL WRAP CTX Free ------- PASS THROUGH VOL WRAP CTX Get ------- PASS THROUGH VOL ATTRIBUTE Write ------- PASS THROUGH VOL WRAP CTX Free ------- PASS THROUGH VOL ATTRIBUTE Close ------- PASS THROUGH VOL WRAP CTX Get ------- PASS THROUGH VOL FILE Specific ------- PASS THROUGH VOL WRAP CTX Free ------- PASS THROUGH VOL WRAP CTX Get ------- PASS THROUGH VOL OBJECT Get ------- PASS THROUGH VOL WRAP CTX Free ------- PASS THROUGH VOL Get object ========= application: nc_close start ------- PASS THROUGH VOL WRAP CTX Get ------- PASS THROUGH VOL ATTRIBUTE Specific ------- PASS THROUGH VOL WRAP CTX Free ------- PASS THROUGH VOL WRAP CTX Get ------- PASS THROUGH VOL FILE Specific ------- PASS THROUGH VOL WRAP CTX Free ------- PASS THROUGH VOL WRAP CTX Get ------- PASS THROUGH VOL OBJECT Get ------- PASS THROUGH VOL WRAP CTX Free ------- PASS THROUGH VOL Get object ------- PASS THROUGH VOL WRAP CTX Get ------- PASS THROUGH VOL H5Gclose ------- PASS THROUGH VOL WRAP CTX Free ========= application: nc_close end ------- PASS THROUGH VOL WRAP CTX Get ------- PASS THROUGH VOL FILE Close ------- PASS THROUGH VOL INFO Free ------- PASS THROUGH VOL UNWRAP Object ------- PASS THROUGH VOL WRAP CTX Free ------- PASS THROUGH VOL INFO Free ------- PASS THROUGH VOL INFO Free ------- PASS THROUGH VOL TERM ```Library Version
--enable-parallel
,--enable-build-mode=debug
, andCFLAGS="-DENABLE_PASSTHRU_LOGGING"
--disable-dap --disable-mmap --disable-nczarr --disable-byterange
. (some configure options here are necessary to avoid known compiling issues with HDF5 1.14.0)(I tested that using HDF5 1.13.2 and NetCDF 4.9.0 can also reproduce the problem.)
Other findings
The problem can be reproduced without the use of a Passthrough VOL. We can add a print statement for the
info->count
in the HDF5 source codes here. It shows the reference count of an object. If Darshan is not enabled, the reference count is1
at the timenc_close
callsH5Fclose
. If Darshan is enabled, the reference count is3
so it thinks someone else is still accessing the file and will delay the actual close to very end. I am not sure whether Darshan holds an extra reference to the file or it is as issue more related to NetCDF4. Using HDF5 directly (no NetCDF4 involved) does not give this issue.