Closed WalterKolczynski-NOAA closed 6 months ago
gfs_bufr is now failing on WCOSS at execution time due to failure to find one of the NetCDF libraries:
gfs_bufr
+ gfs_bufr.sh[97]: mpiexec -l -n 40 --depth=8 --cpu-bind depth /lfs/h2/emc/global/save/walter.kolczynski/global-workflow/fix_gempak/exec/gfs_bufr.x nid001058.cactus.wcoss2.ncep.noaa.gov 0: /lfs/h2/emc/global/save/walter.kolczynski/global-workflow/fix_gempak/exec/gfs_bufr.x: error while loading shared libraries: libnetcdf.so.7: cannot open shared object file: No such file or directory
ldd shows the problem:
+ gfs_bufr.sh[96]: ldd /lfs/h2/emc/global/save/walter.kolczynski/global-workflow/fix_gempak/exec/gfs_bufr.x linux-vdso.so.1 (0x00007ffe957a5000) libnetcdff.so.7 => /apps/prod/hpc-stack/intel-19.1.3.304/cray-mpich-8.1.4/netcdf/4.7.4/lib/libnetcdff.so.7 (0x0000154163833000) libnetcdf.so.18 => /apps/prod/hpc-stack/intel-19.1.3.304/cray-mpich-8.1.4/netcdf/4.7.4/lib/libnetcdf.so.18 (0x00001541634e5000) libnetcdf.so.7 => not found libiomp5.so => /pe/intel/compilers_and_libraries_2020.4.304/linux/compiler/lib/intel64_lin/libiomp5.so (0x00001541630c3000) libmpifort_intel.so.12 => /opt/cray/pe/lib64/libmpifort_intel.so.12 (0x0000154162e24000) libimf.so => /pe/intel/compilers_and_libraries_2020.4.304/linux/compiler/lib/intel64_lin/libimf.so (0x00001541627a1000) libpthread.so.0 => /lib64/libpthread.so.0 (0x000015416277d000) libm.so.6 => /lib64/libm.so.6 (0x0000154162630000) libdl.so.2 => /lib64/libdl.so.2 (0x000015416262b000) libc.so.6 => /lib64/libc.so.6 (0x0000154162436000) libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x0000154162217000) libhdf5_hl.so.100 => /apps/prod/hpc-stack/intel-19.1.3.304/cray-mpich-8.1.4/hdf5/1.10.6/lib/libhdf5_hl.so.100 (0x0000154161fee000) libhdf5.so.103 => /apps/prod/hpc-stack/intel-19.1.3.304/cray-mpich-8.1.4/hdf5/1.10.6/lib/libhdf5.so.103 (0x0000154161902000) libifport.so.5 => /pe/intel/compilers_and_libraries_2020.4.304/linux/compiler/lib/intel64_lin/libifport.so.5 (0x00001541616d2000) libifcoremt.so.5 => /pe/intel/compilers_and_libraries_2020.4.304/linux/compiler/lib/intel64_lin/libifcoremt.so.5 (0x0000154161534000) libsvml.so => /pe/intel/compilers_and_libraries_2020.4.304/linux/compiler/lib/intel64_lin/libsvml.so (0x000015415f9ea000) libintlc.so.5 => /pe/intel/compilers_and_libraries_2020.4.304/linux/compiler/lib/intel64_lin/libintlc.so.5 (0x000015415f772000) libmpi_intel.so.12 => /opt/cray/pe/lib64/libmpi_intel.so.12 (0x000015415cb54000) libirng.so => /pe/intel/compilers_and_libraries_2020.4.304/linux/compiler/lib/intel64_lin/libirng.so (0x000015415c7e9000) /lib64/ld-linux-x86-64.so.2 (0x0000154163cc6000) libifcore.so.5 => /pe/intel/compilers_and_libraries_2020.4.304/linux/compiler/lib/intel64_lin/libifcore.so.5 (0x000015415c681000) libfabric.so.1 => /opt/cray/libfabric/1.11.0.0./lib64/libfabric.so.1 (0x000015415c3d6000) libatomic.so.1 => /usr/lib64/libatomic.so.1 (0x000015415c1cd000) librt.so.1 => /lib64/librt.so.1 (0x000015415c1c3000) libpmi.so.0 => /opt/cray/pe/lib64/libpmi.so.0 (0x000015415bfc1000) libpmi2.so.0 => /opt/cray/pe/lib64/libpmi2.so.0 (0x000015415bd89000) librdmacm.so.1 => /usr/lib64/librdmacm.so.1 (0x000015415bb69000) libibverbs.so.1 => /usr/lib64/libibverbs.so.1 (0x000015415b949000) libpals.so.0 => /opt/cray/pe/lib64/libpals.so.0 (0x000015415b744000) libnl-3.so.200 => /usr/lib64/libnl-3.so.200 (0x000015415b522000) libnl-route-3.so.200 => /usr/lib64/libnl-route-3.so.200 (0x000015415b2ac000)
The shared object does not appear in the netcdf library:
WCOSS2 (BACKUPSYS) sorc> ls /apps/prod/hpc-stack/intel-19.1.3.304/cray-mpich-8.1.4/netcdf/4.7.4/lib/ -l total 6.4M -rwxr-xr-x 1 hpc-adm hpc-adm 1.5K Oct 17 2021 libh5bzip2.la -rwxr-xr-x 1 hpc-adm hpc-adm 96K Oct 17 2021 libh5bzip2.so -rw-r--r-- 1 hpc-adm hpc-adm 1.8M Oct 17 2021 libnetcdf.a -rw-r--r-- 1 hpc-adm hpc-adm 838K Oct 17 2021 libnetcdf_c++4.a -rwxr-xr-x 1 hpc-adm hpc-adm 1.5K Oct 17 2021 libnetcdf_c++4.la lrwxrwxrwx 1 hpc-adm hpc-adm 23 Oct 17 2021 libnetcdf_c++4.so -> libnetcdf_c++4.so.1.1.0 lrwxrwxrwx 1 hpc-adm hpc-adm 23 Oct 17 2021 libnetcdf_c++4.so.1 -> libnetcdf_c++4.so.1.1.0 -rwxr-xr-x 1 hpc-adm hpc-adm 466K Oct 17 2021 libnetcdf_c++4.so.1.1.0 -rw-r--r-- 1 hpc-adm hpc-adm 1019K Oct 17 2021 libnetcdff.a -rwxr-xr-x 1 hpc-adm hpc-adm 1.5K Oct 17 2021 libnetcdff.la -rw-r--r-- 1 hpc-adm hpc-adm 1.4K Oct 17 2021 libnetcdff.settings lrwxrwxrwx 1 hpc-adm hpc-adm 19 Oct 17 2021 libnetcdff.so -> libnetcdff.so.7.0.0 lrwxrwxrwx 1 hpc-adm hpc-adm 19 Oct 17 2021 libnetcdff.so.7 -> libnetcdff.so.7.0.0 -rwxr-xr-x 1 hpc-adm hpc-adm 823K Oct 17 2021 libnetcdff.so.7.0.0 -rwxr-xr-x 1 hpc-adm hpc-adm 1.3K Oct 17 2021 libnetcdf.la -rw-r--r-- 1 hpc-adm hpc-adm 1.4K Oct 17 2021 libnetcdf.settings lrwxrwxrwx 1 hpc-adm hpc-adm 19 Oct 17 2021 libnetcdf.so -> libnetcdf.so.18.0.0 lrwxrwxrwx 1 hpc-adm hpc-adm 19 Oct 17 2021 libnetcdf.so.18 -> libnetcdf.so.18.0.0 -rwxr-xr-x 1 hpc-adm hpc-adm 1.4M Oct 17 2021 libnetcdf.so.18.0.0 drwxr-xr-x 2 hpc-adm hpc-adm 4.0K Oct 17 2021 pkgconfig
Not sure if this is related to the recent PRs (though it seems likely) or is just a coincidence.
Isolated to PR #50
Isolated to the gempak module
gfs_bufr
is now failing on WCOSS at execution time due to failure to find one of the NetCDF libraries:ldd shows the problem:
The shared object does not appear in the netcdf library:
Not sure if this is related to the recent PRs (though it seems likely) or is just a coincidence.