NOAA-EMC / gfs-utils

Utility programs for global-workflow
0 stars 19 forks source link

gfs_bufr fails to find a NetCDF library on WCOSS #53

Closed WalterKolczynski-NOAA closed 6 months ago

WalterKolczynski-NOAA commented 6 months ago

gfs_bufr is now failing on WCOSS at execution time due to failure to find one of the NetCDF libraries:

+ gfs_bufr.sh[97]: mpiexec -l -n 40 --depth=8 --cpu-bind depth /lfs/h2/emc/global/save/walter.kolczynski/global-workflow/fix_gempak/exec/gfs_bufr.x
nid001058.cactus.wcoss2.ncep.noaa.gov 0: /lfs/h2/emc/global/save/walter.kolczynski/global-workflow/fix_gempak/exec/gfs_bufr.x: error while loading shared libraries: libnetcdf.so.7: cannot open
 shared object file: No such file or directory

ldd shows the problem:

+ gfs_bufr.sh[96]: ldd /lfs/h2/emc/global/save/walter.kolczynski/global-workflow/fix_gempak/exec/gfs_bufr.x
    linux-vdso.so.1 (0x00007ffe957a5000)
    libnetcdff.so.7 => /apps/prod/hpc-stack/intel-19.1.3.304/cray-mpich-8.1.4/netcdf/4.7.4/lib/libnetcdff.so.7 (0x0000154163833000)
    libnetcdf.so.18 => /apps/prod/hpc-stack/intel-19.1.3.304/cray-mpich-8.1.4/netcdf/4.7.4/lib/libnetcdf.so.18 (0x00001541634e5000)
    libnetcdf.so.7 => not found
    libiomp5.so => /pe/intel/compilers_and_libraries_2020.4.304/linux/compiler/lib/intel64_lin/libiomp5.so (0x00001541630c3000)
    libmpifort_intel.so.12 => /opt/cray/pe/lib64/libmpifort_intel.so.12 (0x0000154162e24000)
    libimf.so => /pe/intel/compilers_and_libraries_2020.4.304/linux/compiler/lib/intel64_lin/libimf.so (0x00001541627a1000)
    libpthread.so.0 => /lib64/libpthread.so.0 (0x000015416277d000)
    libm.so.6 => /lib64/libm.so.6 (0x0000154162630000)
    libdl.so.2 => /lib64/libdl.so.2 (0x000015416262b000)
    libc.so.6 => /lib64/libc.so.6 (0x0000154162436000)
    libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x0000154162217000)
    libhdf5_hl.so.100 => /apps/prod/hpc-stack/intel-19.1.3.304/cray-mpich-8.1.4/hdf5/1.10.6/lib/libhdf5_hl.so.100 (0x0000154161fee000)
    libhdf5.so.103 => /apps/prod/hpc-stack/intel-19.1.3.304/cray-mpich-8.1.4/hdf5/1.10.6/lib/libhdf5.so.103 (0x0000154161902000)
    libifport.so.5 => /pe/intel/compilers_and_libraries_2020.4.304/linux/compiler/lib/intel64_lin/libifport.so.5 (0x00001541616d2000)
    libifcoremt.so.5 => /pe/intel/compilers_and_libraries_2020.4.304/linux/compiler/lib/intel64_lin/libifcoremt.so.5 (0x0000154161534000)
    libsvml.so => /pe/intel/compilers_and_libraries_2020.4.304/linux/compiler/lib/intel64_lin/libsvml.so (0x000015415f9ea000)
    libintlc.so.5 => /pe/intel/compilers_and_libraries_2020.4.304/linux/compiler/lib/intel64_lin/libintlc.so.5 (0x000015415f772000)
    libmpi_intel.so.12 => /opt/cray/pe/lib64/libmpi_intel.so.12 (0x000015415cb54000)
    libirng.so => /pe/intel/compilers_and_libraries_2020.4.304/linux/compiler/lib/intel64_lin/libirng.so (0x000015415c7e9000)
    /lib64/ld-linux-x86-64.so.2 (0x0000154163cc6000)
    libifcore.so.5 => /pe/intel/compilers_and_libraries_2020.4.304/linux/compiler/lib/intel64_lin/libifcore.so.5 (0x000015415c681000)
    libfabric.so.1 => /opt/cray/libfabric/1.11.0.0./lib64/libfabric.so.1 (0x000015415c3d6000)
    libatomic.so.1 => /usr/lib64/libatomic.so.1 (0x000015415c1cd000)
    librt.so.1 => /lib64/librt.so.1 (0x000015415c1c3000)
    libpmi.so.0 => /opt/cray/pe/lib64/libpmi.so.0 (0x000015415bfc1000)
    libpmi2.so.0 => /opt/cray/pe/lib64/libpmi2.so.0 (0x000015415bd89000)
    librdmacm.so.1 => /usr/lib64/librdmacm.so.1 (0x000015415bb69000)  
    libibverbs.so.1 => /usr/lib64/libibverbs.so.1 (0x000015415b949000)
    libpals.so.0 => /opt/cray/pe/lib64/libpals.so.0 (0x000015415b744000)
    libnl-3.so.200 => /usr/lib64/libnl-3.so.200 (0x000015415b522000)
    libnl-route-3.so.200 => /usr/lib64/libnl-route-3.so.200 (0x000015415b2ac000)

The shared object does not appear in the netcdf library:

WCOSS2 (BACKUPSYS) sorc> ls /apps/prod/hpc-stack/intel-19.1.3.304/cray-mpich-8.1.4/netcdf/4.7.4/lib/ -l
total 6.4M
-rwxr-xr-x 1 hpc-adm hpc-adm  1.5K Oct 17  2021 libh5bzip2.la
-rwxr-xr-x 1 hpc-adm hpc-adm   96K Oct 17  2021 libh5bzip2.so
-rw-r--r-- 1 hpc-adm hpc-adm  1.8M Oct 17  2021 libnetcdf.a
-rw-r--r-- 1 hpc-adm hpc-adm  838K Oct 17  2021 libnetcdf_c++4.a
-rwxr-xr-x 1 hpc-adm hpc-adm  1.5K Oct 17  2021 libnetcdf_c++4.la
lrwxrwxrwx 1 hpc-adm hpc-adm    23 Oct 17  2021 libnetcdf_c++4.so -> libnetcdf_c++4.so.1.1.0
lrwxrwxrwx 1 hpc-adm hpc-adm    23 Oct 17  2021 libnetcdf_c++4.so.1 -> libnetcdf_c++4.so.1.1.0
-rwxr-xr-x 1 hpc-adm hpc-adm  466K Oct 17  2021 libnetcdf_c++4.so.1.1.0
-rw-r--r-- 1 hpc-adm hpc-adm 1019K Oct 17  2021 libnetcdff.a
-rwxr-xr-x 1 hpc-adm hpc-adm  1.5K Oct 17  2021 libnetcdff.la
-rw-r--r-- 1 hpc-adm hpc-adm  1.4K Oct 17  2021 libnetcdff.settings
lrwxrwxrwx 1 hpc-adm hpc-adm    19 Oct 17  2021 libnetcdff.so -> libnetcdff.so.7.0.0
lrwxrwxrwx 1 hpc-adm hpc-adm    19 Oct 17  2021 libnetcdff.so.7 -> libnetcdff.so.7.0.0
-rwxr-xr-x 1 hpc-adm hpc-adm  823K Oct 17  2021 libnetcdff.so.7.0.0
-rwxr-xr-x 1 hpc-adm hpc-adm  1.3K Oct 17  2021 libnetcdf.la
-rw-r--r-- 1 hpc-adm hpc-adm  1.4K Oct 17  2021 libnetcdf.settings
lrwxrwxrwx 1 hpc-adm hpc-adm    19 Oct 17  2021 libnetcdf.so -> libnetcdf.so.18.0.0
lrwxrwxrwx 1 hpc-adm hpc-adm    19 Oct 17  2021 libnetcdf.so.18 -> libnetcdf.so.18.0.0
-rwxr-xr-x 1 hpc-adm hpc-adm  1.4M Oct 17  2021 libnetcdf.so.18.0.0
drwxr-xr-x 2 hpc-adm hpc-adm  4.0K Oct 17  2021 pkgconfig

Not sure if this is related to the recent PRs (though it seems likely) or is just a coincidence.

WalterKolczynski-NOAA commented 6 months ago

Isolated to PR #50

WalterKolczynski-NOAA commented 6 months ago

Isolated to the gempak module