Unidata / netcdf-c

Official GitHub repository for netCDF-C libraries and utilities.
BSD 3-Clause "New" or "Revised" License
494 stars 261 forks source link

Undefined reference to `H5Literate' with IBM XL compiler #2714

Closed WillTrojak closed 1 year ago

WillTrojak commented 1 year ago

I'm trying to compile netcdf-c for WRF and I get the following error during compilation:

...
/bin/sh ../libtool  --tag=CC   --mode=link /opt/ibm/spectrum_mpi/bin/mpicc -std=gnu11  -I/home/wtrojak/.local/pkg/hdf5/1.14.0/include -fno-strict-aliasing   -L/home/wtrojak/.local/pkg/hdf5/1.14.0/lib -L/home/wtrojak/.local/pkg/zlib/1.2.13/lib -o ncgen3 main.o load.o escapes.o getfill.o init.o genlib.o ncgeny.o ../liblib/libnetcdf.la -lhdf5_hl -lhdf5 -lm -lz -lsz -lbz2 -lzstd -lxml2 -lcurl 
libtool: link: /opt/ibm/spectrum_mpi/bin/mpicc -std=gnu11 -I/home/wtrojak/.local/pkg/hdf5/1.14.0/include -fno-strict-aliasing -o .libs/ncgen3 main.o load.o escapes.o getfill.o init.o genlib.o ncgeny.o  -L/home/wtrojak/.local/pkg/hdf5/1.14.0/lib -L/home/wtrojak/.local/pkg/zlib/1.2.13/lib ../liblib/.libs/libnetcdf.so -lhdf5_hl -lhdf5 -lm -lz -lsz -lbz2 -lzstd -lxml2 -lcurl -Wl,-rpath -Wl,/home/wtrojak/.local/pkg/netcdf-c/4.9.2/lib
../liblib/.libs/libnetcdf.so: undefined reference to `H5Literate'
make[2]: *** [ncgen3] Error 1
make[2]: Leaving directory `/autofs/home/wtrojak/install/netcdf-c-4.9.2/ncgen3'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/autofs/home/wtrojak/install/netcdf-c-4.9.2'
make: *** [all] Error 2

I was wondering if anyone could provide help getting to a fix for this?

Here is the configuration command I used for netcdf-c:

export H5DIR=$HOME/.local/pkg/hdf5/1.14.0
export ZDIR=$HOME/.local/pkg/zlib/1.2.13
FC=$( which mpif90 ) \
CC=$( which mpicc ) \
CXX=$( which mpic++ ) \
./configure \
  --prefix=$HOME/.local/pkg/netcdf-c/4.9.2 \
  --disable-byterange \
  --disable-dap \
  --enable-parallel-tests \
  --enable-shared \
  --enable-hdf5

When compiling HDF5 I've tried a few different configs/cmake setups, all gave the same error during compilation. What I used in this was 1.14.0 with the following cmake command:

CC=$( which mpicc ) \
FC=$( which mpif90 ) \
CXX=$( which mpic++ ) \
cmake ../. \
  -DCMAKE_INSTALL_PREFIX=$HOME/.local/pkg/hdf5/1.14.0 \
  -DCMAKE_BUILD_TYPE=Release \
  -DHDF5_BUILD_FORTRAN=ON \
  -DHDF5_BUILD_HL_LIB=ON \
  -DHDF5_ENABLE_PARALLEL=ON \
  -DHDF5_ENABLE_Z_LIB_SUPPORT=ON \
  -DZLIB_LIBRARY=$HOME/.local/pkg/zlib/1.2.13/lib/libz.so \
  -DZLIB_INCLUDE_PATH=$HOME/.local/pkg/zlib/1.2.13/include/ \
  -DDEFAULT_API_VERSION:STRING=v110

The compiler I'm using is XL V16.1.1.3 with Spectrum MPI on RHEL 7.6.

Dave-Allured commented 1 year ago

Try this. Remove -DDEFAULT_API_VERSION:STRING=v110 and rebuild HDF5, then Netcdf. After v1.10, H5Literate changed from a function to a macro with version numbered symbols. I believe that Netcdf 4.9.2 is set up to build against the default HDF5 v1.14 API, not the v1.10 API.

https://portal.hdfgroup.org/display/HDF5/Migrating+from+HDF5+1.10+to+HDF5+1.12

WillTrojak commented 1 year ago

Hi @Dave-Allured thanks for getting back to me. I initially tried without that flag and got the same error. I just deleted hdf5 and recompiled it without and I do indeed get the same error. I have also tried building netcdf-c-4.9.2 using cmake and get the same issue.

Adding the verbose make file command to cmake, this is the command that is failing:

/opt/ibm/spectrum_mpi/bin/mpicc -I/home/wtrojak/.local/pkg/hdf5/1.14.0/include -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -O -L/home/wtrojak/.local/pkg/hdf5/1.14.0/lib -L/home/wtrojak/.local/pkg/zlib/1.2.13/lib CMakeFiles/nc_test4_tst_camrun.dir/tst_camrun.c.o -o nc_test4_tst_camrun  -Wl,-rpath,/home/wtrojak/.local/pkg/zstd/1.5.5/lib64:/home/wtrojak/install/netcdf-c-4.9.2/build/liblib /usr/lib64/libm.so /usr/lib64/libz.so /home/wtrojak/.local/pkg/zstd/1.5.5/lib64/libzstd.so /usr/lib64/libbz2.so /usr/lib64/libcurl.so /usr/lib64/libxml2.so ../liblib/libnetcdf.so.19 /home/wtrojak/.local/pkg/hdf5/1.14.0/lib/libhdf5_hl.so.310.0.0 /home/wtrojak/.local/pkg/hdf5/1.14.0/lib/libhdf5.so.310.0.0 /usr/lib64/libm.so /usr/lib64/libz.so /home/wtrojak/.local/pkg/zstd/1.5.5/lib64/libzstd.so /usr/lib64/libbz2.so /usr/lib64/libcurl.so /usr/lib64/libxml2.so -ldl 
../liblib/libnetcdf.so.19: undefined reference to `H5Literate'

Using ldd on libnetcdf.so.19 all the links seem to be found, and using grep on libhdf5_hl.so it seems that "H5Literate" is in there.

Dave-Allured commented 1 year ago

All right. Please show the full output from three commands. Add the correct path prefixes, or change directory, as needed. Each should only be a few lines:

nm libhdf5.so    | grep H5Literate
nm libhdf5_hl.so | grep H5Literate
nm libnetcdf.so  | grep H5Literate
WillTrojak commented 1 year ago

Below are the outputs, for some reason there is no H5Literate in hdh5 but there are H5Literate1 and H5Literate2.

$ nm /path/to/hdf5/1.14.0/lib/libhdf5.so | grep H5Literate
00000000003b3d00 T H5Literate1
00000000003b1960 T H5Literate2
00000000003b2060 T H5Literate_async
00000000003b4560 T H5Literate_by_name1
00000000003b2520 T H5Literate_by_name2

Whereas H5Literate2 is undefined in hdf5_hl

$ nm /path/to/hdf5/1.14.0/lib/libhdf5_hl.so | grep H5Literate
0000000000006440 t 00000017.plt_call.H5Literate2
                 U H5Literate2

I realise this means it's not a netcdf problem, but can you offer any suggestions on how this might be fixed?

Dave-Allured commented 1 year ago

Those symbols in libhdf5 and libhdf5_hl are what I expected for HDF5 1.14, and as described in the above migration guide.

I think I was mistaken above, suggesting that -DDEFAULT_API_VERSION:STRING=v110 might be the cause of your problem. Regardless, it is best practice in my opinion to never specify -DDEFAULT_API_VERSION for any HDF5 build for general deployment. This allows a single HDF5 build to remain compatible with multiple user programs, provided that older programs, made with previous HDF5 versions, use the correct API compatibility controls.

I think that your H5Literate issue stems from a problem with HDF5 header files used in the Netcdf build. They might be misconfigured, or your build might be picking up the wrong header file from some other location. However, I do not see the exact problem yet.

Dave-Allured commented 1 year ago

To clarify, any Netcdf library version built correctly against HDF5 version 1.14 should be calling either H5Literate1 or H5Literate2 under the hood, never just plain H5Literate. This is controlled by the API compatibility macros in HDF5 header files.

https://portal.hdfgroup.org/display/HDF5/API+Compatibility+Macros

gsjaardema commented 1 year ago

Those symbols in libhdf5 and libhdf5_hl are what I expected for HDF5 1.14, and as described in the above migration guide.

I think I was mistaken above, suggesting that -DDEFAULT_API_VERSION:STRING=v110 might be the cause of your problem. Regardless, it is best practice in my opinion to never specify -DDEFAULT_API_VERSION for any HDF5 build for general deployment. This allows a single HDF5 build to remain compatible with multiple user programs, provided that older programs, made with previous HDF5 versions, use the correct API compatibility controls.

I think that your H5Literate issue stems from a problem with HDF5 header files used in the Netcdf build. They might be misconfigured, or your build might be picking up the wrong header file from some other location. However, I do not see the exact problem yet.

When I have seen this in the past, the build has somehow picked up an incompatible mixture of includes and libraries due to a different version of HDF5 being installed system-wide and locally. For some combinations, it will pick up one set for include and another for linking.

Dave-Allured commented 1 year ago

When I have seen this in the past, the build has somehow picked up an incompatible mixture of includes and libraries due to a different version of HDF5 being installed system-wide and locally. For some combinations, it will pick up one set for include and another for linking.

Agreed. @WillTrojak, does your compiler have debug options to show either the expanded include file paths during compilation, or alternatively the full included text? I think you are looking specifically for the compile of netcdf-c-4.9.2/libhdf5/hdf5open.c, which contains the single invocation of H5Literate.

WillTrojak commented 1 year ago

Hi thanks for this it really helped. I seemingly managed to get it to compile. For anyone who has this problem with IBM XL compilers in the future, the issue was that xl was including system hdf5 headers and not the ones for the compiled HDF5. I discovered this by adding the -qlist c flag.

The issue was caused by the environment variables I set in the HDF5 lmod modulefiles relating to the include path. Specifically, the environment variables that caused the errors were CPATH and C_INCLUDE_PATH

Dave-Allured commented 1 year ago

Great! Thanks for posting the solution for the XL compiler. Please close this issue.