easybuilders / easybuild-easyconfigs

A collection of easyconfig files that describe which software to build using which build options with EasyBuild.
https://easybuild.io
GNU General Public License v2.0
379 stars 701 forks source link

cannot build NVHPC on ubuntu (and workaround) #20375

Open pescobar opened 6 months ago

pescobar commented 6 months ago

Hi,

I am using easybuild 4.9.1 on ubuntu jammy (22.04) but I couldn't build NVHPC until I patched the easyblock.

With the default easyblock I get this error in the sanity_check_step

== 2024-04-16 16:56:44,636 build_log.py:171 ERROR EasyBuild crashed with an error (at easybuild/apps/EasyBuild/4.9.1/lib/python3.10/site-packages/easybuild/base/exceptions.py:126 in __init__): Sanity check failed: sanity check com
mand cd /tmp/eb-bk_6m8dc/tmp7fpol1sc && nvc++ -std=c++20 minimal.cpp -o minimal exited with code 2 (output: /scicore/soft/easybuild/apps/binutils/2.40/bin/ld: cannot find -ldl: No such file or directory
/scicore/soft/easybuild/apps/binutils/2.40/bin/ld: cannot find -lpthread: No such file or directory
/scicore/soft/easybuild/apps/binutils/2.40/bin/ld: cannot find -lc: No such file or directory
) (at easybuild/apps/EasyBuild/4.9.1/lib/python3.10/site-packages/easybuild/framework/easyblock.py:3669 in _sanity_check_step)

After some debugging I managed to build it patching the easyblock like this (had to add -L/lib/x86_64-linux-gnu)

diff -ru $EBROOTEASYBUILD/lib/python3.10/site-packages/easybuild/easyblocks/n/nvhpc.py /tmp/nvhpc.py
--- /scicore/soft/easybuild/apps/EasyBuild/4.9.1/lib/python3.10/site-packages/easybuild/easyblocks/n/nvhpc.py   2024-04-05 09:30:35.000000000 +0200
+++ /tmp/nvhpc.py       2024-04-16 16:47:59.268066132 +0200
@@ -221,7 +221,7 @@
             # see: https://github.com/easybuilders/easybuild-easyblocks/pull/3240
             tmpdir = tempfile.mkdtemp()
             write_file(os.path.join(tmpdir, 'minimal.cpp'), NVHPC_MINIMAL_EXAMPLE)
-            minimal_compiler_cmd = "cd %s && nvc++ -std=c++20 minimal.cpp -o minimal" % tmpdir
+            minimal_compiler_cmd = "cd %s && nvc++ -L/lib/x86_64-linux-gnu -std=c++20 minimal.cpp -o minimal" % tmpdir
             custom_commands.append(minimal_compiler_cmd)

         super(EB_NVHPC, self).sanity_check_step(custom_paths=custom_paths, custom_commands=custom_commands)

I am not sure how to fix it upstream. If you give me some advice I can send a proper PR

pescobar commented 6 months ago

CCing those who contributed to this easyblock in case they can give any feedback @AndiH @jfgrimm @appolloford @boegel

jfgrimm commented 6 months ago

hmm, don't think I've seen that issue before. I'll have a go at building on ubuntu 22.04

jfgrimm commented 6 months ago

I can't reproduce this on any of my Ubuntu 22.04 systems

cgross95 commented 6 months ago

We see this exact same issue on Ubuntu 22.04 building NVHPC-23.7-CUDA-12.1.1.eb. We haven't tried the workaround yet. We can also provide any other information from our configuration if it would be helpful for reproducing.

cgross95 commented 6 months ago

Here is a test report.

AndiH commented 6 months ago

I'm really no EB expert, just one of the authors of the EasyBlock.

To me, it sounds like the compiler doesn't look in the right directories for the library objects; i.e. the LD_LIBRARY_PATH may be incomplete. Not sure, though, if this falls into EasyBuild's responsibility…

pescobar commented 6 months ago

@jfgrimm can you try this to check what libraries you link?

$> module purge

$> module load NVHPC/23.7-CUDA-12.3.0

$> cat <<EOF > minimal.cpp
#include <ranges>
int main(){ return 0; }
EOF

$>  nvc++ minimal.cpp -o minimal

$> ldd minimal

This is what I get in my system. The only difference is that I had to compile the binary using nvc++ -L/lib/x86_64-linux-gnu minimal.cpp -o minimal

$> ldd minimal
        linux-vdso.so.1 (0x00007ffdf57f9000)
        libatomic.so.1 => /scicore/soft/easybuild/apps/GCCcore/12.3.0/lib64/libatomic.so.1 (0x00007f1ff6e24000)
        libnvhpcatm.so => /scicore/soft/easybuild/apps/NVHPC/23.7-CUDA-12.3.0/Linux_x86_64/23.7/compilers/lib/libnvhpcatm.so (0x00007f1ff6c00000)
        libstdc++.so.6 => /scicore/soft/easybuild/apps/GCCcore/12.3.0/lib64/libstdc++.so.6 (0x00007f1ff69d8000)
        libnvomp.so => /scicore/soft/easybuild/apps/NVHPC/23.7-CUDA-12.3.0/Linux_x86_64/23.7/compilers/lib/libnvomp.so (0x00007f1ff5800000)
        libnvcpumath.so => /scicore/soft/easybuild/apps/NVHPC/23.7-CUDA-12.3.0/Linux_x86_64/23.7/compilers/lib/libnvcpumath.so (0x00007f1ff5200000)
        libnvc.so => /scicore/soft/easybuild/apps/NVHPC/23.7-CUDA-12.3.0/Linux_x86_64/23.7/compilers/lib/libnvc.so (0x00007f1ff4e00000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f1ff4bd8000)
        libgcc_s.so.1 => /scicore/soft/easybuild/apps/GCCcore/12.3.0/lib64/libgcc_s.so.1 (0x00007f1ff69b7000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f1ff68d0000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f1ff6e30000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f1ff68cb000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f1ff68c6000)
jfgrimm commented 6 months ago

oh, I just realised that although I installed EB 4.9.1, I was still building with 4.9.0...

I now get the same error :facepalm:

jfgrimm commented 6 months ago

I'm guessing that the issue is with the generated localrc. I don't think this is actually on easybuild doing something weird, I get the same issue if I manually run makelocalrc -x -gcc $(which gcc) -gpp $(which g++) -g77 $(which gfortran)

Adding the following line to the generated localrc fixes the issue for me:

set DEFLIBDIR=/lib/x86_64-linux-gnu;
pescobar commented 6 months ago

I noticed this line which is specific to debian systems

but it's not applied for versions < 21 . See here

patching the easyblock like this also workarounds the issue but to be honest I am not sure what would be the right fix

 diff -ru /scicore/soft/easybuild/apps/EasyBuild/4.9.1/lib/python3.10/site-packages/easybuild/easyblocks/n/nvhpc.py ~/tmp/nvhpc.py
--- /scicore/soft/easybuild/apps/EasyBuild/4.9.1/lib/python3.10/site-packages/easybuild/easyblocks/n/nvhpc.py   2024-04-17 11:23:00.892718404 +0200
+++ /scicore/home/scicore/easybuild/tmp/nvhpc.py        2024-04-17 12:51:54.115386938 +0200
@@ -54,9 +54,6 @@
 # contents for siterc file to make PGI/NVHPC pick up $LIBRARY_PATH
 # cfr. https://www.pgroup.com/support/link.htm#lib_path_ldflags
 SITERC_LIBRARY_PATH = """
-# get the value of the environment variable LIBRARY_PATH
-variable LIBRARY_PATH is environment(LIBRARY_PATH);
-
 # split this value at colons, separate by -L, prepend 1st one by -L
 variable library_path is
 default($if($LIBRARY_PATH,-L$replace($LIBRARY_PATH,":", -L)));
@@ -188,12 +185,11 @@
                 if os.path.islink(path):
                     os.remove(path)

-        if LooseVersion(self.version) < LooseVersion('21.3'):
-            # install (or update) siterc file to make NVHPC consider $LIBRARY_PATH
-            siterc_path = os.path.join(compilers_subdir, 'bin', 'siterc')
-            write_file(siterc_path, SITERC_LIBRARY_PATH, append=True)
-            self.log.info("Appended instructions to pick up $LIBRARY_PATH to siterc file at %s: %s",
-                          siterc_path, SITERC_LIBRARY_PATH)
+        # install (or update) siterc file to make NVHPC consider $LIBRARY_PATH
+        siterc_path = os.path.join(compilers_subdir, 'bin', 'siterc')
+        write_file(siterc_path, SITERC_LIBRARY_PATH, append=True)
+        self.log.info("Appended instructions to pick up $LIBRARY_PATH to siterc file at %s: %s",
+                      siterc_path, SITERC_LIBRARY_PATH)

         # The cuda nvvp tar file has broken permissions
         adjust_permissions(self.installdir, stat.S_IWUSR, add=True, onlydirs=True)
pescobar commented 6 months ago

To summarize, the problem is that debian systems need to have folder /usr/lib/x86_64-linux-gnu/ in env var LIBRARY_PATH. This is needed to compile the example in the sanity_check step and also to build any other easyconfig using the nvhpc toolchain

possible solutions are:

allow_append_abs_path = True
modextrapaths = {
    'LD_LIBRARY_PATH':  'Linux_%(arch)s/%(version)s/compilers/extras/qd/lib/',
    'LIBRARY_PATH':  'Linux_%(arch)s/%(version)s/compilers/extras/qd/lib/'
    }

modextrapaths_append = {
    'LD_LIBRARY_PATH':  '/usr/lib/x86_64-linux-gnu',
    'LIBRARY_PATH':  '/usr/lib/x86_64-linux-gnu'
    }
akesandgren commented 2 weeks ago

The correct fix, in my opinion, is to make makelocalrc add

set DEFLIBDIR=/usr/lib/x86_64-linux-gnu;

using this patch in the easyconfig file.

--- nvhpc_2024_241_Linux_x86_64_cuda_multi.orig/install_components/Linux_x86_64/24.1/compilers/bin/makelocalrc  2024-01-26 20:59:25.000000000 +0100
+++ nvhpc_2024_241_Linux_x86_64_cuda_multi/install_components/Linux_x86_64/24.1/compilers/bin/makelocalrc       2024-10-14 12:50:21.607908725 +0200
@@ -394,8 +394,9 @@
     fi
   done

+  os_like=$(grep ID_LIKE /etc/os-release | cut -d= -f2 | sed -e 's/"//g' | cut -d\  -f1)
   # DEFLIBDIR
-  if [ "${arch}" != "Linux_x86_64" ]; then
+  if [ "${arch}" != "Linux_x86_64" -o "${os_like}" = "debian" ]; then
     print_line "set DEFLIBDIR=${DEFLIBDIR};"
   fi