UCL-RITS / rcps-buildscripts

Scripts to automate package builds on RC Platforms
MIT License
39 stars 27 forks source link

Bugfix: C++ compiler in the Nvidia HPC Toolkit 22.9 is misconfigured. #557

Closed owainkenwayucl closed 11 months ago

owainkenwayucl commented 11 months ago

The C++ compiler nvc++ is somehow misconfigured in 22.9 (it's fine in previous versions) and as a result it cannot create binaries.

The root cause seems to be it setting:

set GPPDIR= ;

in /shared/ucl/apps/nvhpc/2022_229/Linux_x86_64/22.9/compilers/bin/localrc rather than a list of directories for includes like earlier versions do:

set GPPDIR= /shared/ucl/apps/emacs/26.3/include /shared/ucl/apps/giflib/5.1.1/gnu-4.9.2/include /shared/ucl/apps/apr-util/1.6.1/include /shared/ucl/apps/apr/1.7.0/include /shared/ucl/apps/flex/2.5.39/gnu-4.9.2/include /lustre/shared/ucl/apps/gcc/4.9.2/include/c++/4.9.2 /lustre/shared/ucl/apps/gcc/4.9.2/include/c++/4.9.2/x86_64-unknown-linux-gnu /lustre/shared/ucl/apps/gcc/4.9.2/include/c++/4.9.2/backward /lustre/shared/ucl/apps/gcc/4.9.2/lib/gcc/x86_64-unknown-linux-gnu/4.9.2/include /lustre/shared/ucl/apps/gcc/4.9.2/lib/gcc/x86_64-unknown-linux-gnu/4.9.2/include-fixed /usr/local/include /lustre/shared/ucl/apps/gcc/4.9.2/include /usr/include;

Steps

1: work out if this can be fixed when we trigger the installer and fix the scripts. If it can't we have to patch localrc ourselves.
2: replicate with newest Nvidia HPC toolkit to see if this is a bug that has been fixed.

owainkenwayucl commented 11 months ago

I can't even replicate the problem with the 22.9 compiler - reinstalling my local install worked:

Myriad [login13] nvidia-hpc-sdk :) > less /home/uccaoke/Applications/nvhpc/2022_229/nvidia-2022-22.9/Linux_x86_64/22.9/compilers/bin/localrc
set LFC=-lgfortran;
set LDSO=/lib64/ld-linux-x86-64.so.2;
set GCCDIR=/usr/lib/gcc/x86_64-redhat-linux/4.8.5;
set G77DIR=/usr/lib/gcc/x86_64-redhat-linux/4.8.5/;
set OEM_INFO=64-bit target on x86-64 Linux $INFOTPVAL;
set GNUATOMIC=-latomic;
set GCCINC= /shared/ucl/apps/nvhpc/2022_229/Linux_x86_64/22.9/compilers/extras/qd/include /shared/ucl/apps/nvhpc/2022_229/Linux_x86_64/22.9/cuda/11.7/extras/CUPTI/include /shared/ucl/apps/nvhpc/2022_229/Linux_x86_64/22.9/comm_libs/nvshmem/include /shared/ucl/apps/nvhpc/2022_229/Linux_x86_64/22.9/comm_libs/nccl/include /shared/ucl/apps/nvhpc/2022_229/Linux_x86_64/22.9/comm_libs/mpi/include /shared/ucl/apps/nvhpc/2022_229/Linux_x86_64/22.9/math_libs/include /shared/ucl/apps/nvhpc/2022_229/Linux_x86_64/22.9/compilers/include /shared/ucl/apps/nvhpc/2022_229/Linux_x86_64/22.9/cuda/include /shared/ucl/apps/emacs/28.1/include /shared/ucl/apps/giflib/5.1.1/gnu-4.9.2/include /shared/ucl/apps/apr-util/1.6.1/include /shared/ucl/apps/apr/1.7.0/include /shared/ucl/apps/flex/2.5.39/gnu-4.9.2/include /usr/lib/gcc/x86_64-redhat-linux/4.8.5/include /usr/local/include /usr/include;
set GPPDIR= /shared/ucl/apps/nvhpc/2022_229/Linux_x86_64/22.9/compilers/extras/qd/include /shared/ucl/apps/nvhpc/2022_229/Linux_x86_64/22.9/cuda/11.7/extras/CUPTI/include /shared/ucl/apps/nvhpc/2022_229/Linux_x86_64/22.9/comm_libs/nvshmem/include /shared/ucl/apps/nvhpc/2022_229/Linux_x86_64/22.9/comm_libs/nccl/include /shared/ucl/apps/nvhpc/2022_229/Linux_x86_64/22.9/comm_libs/mpi/include /shared/ucl/apps/nvhpc/2022_229/Linux_x86_64/22.9/math_libs/include /shared/ucl/apps/nvhpc/2022_229/Linux_x86_64/22.9/compilers/include /shared/ucl/apps/nvhpc/2022_229/Linux_x86_64/22.9/cuda/include /shared/ucl/apps/emacs/28.1/include /shared/ucl/apps/giflib/5.1.1/gnu-4.9.2/include /shared/ucl/apps/apr-util/1.6.1/include /shared/ucl/apps/apr/1.7.0/include /shared/ucl/apps/flex/2.5.39/gnu-4.9.2/include /usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../include/c++/4.8.5 /usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../include/c++/4.8.5/x86_64-redhat-linux /usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../include/c++/4.8.5/backward /usr/lib/gcc/x86_64-redhat-linux/4.8.5/include /usr/local/include /usr/include;
set NUMALIBNAME=-lnuma ;
set LOCALRC=YES;
set EXTENSION=__extension__=;
set LC=-lgcc -lc $if(-Bstatic,-lgcc_eh, -lgcc_s);
set DEFCUDAVERSION=10.2;
set DEFSTDPARCOMPUTECAP=;
# GLIBC version 2.17
# GCC version 4.8.5
set GCCVERSION=40805;
set LIBNCURSES=YES;
export PGI=$COMPBASE;
owainkenwayucl commented 11 months ago

Maybe it's something that changed in ccspapp's environment?

owainkenwayucl commented 11 months ago

Re-running the script as ccspapp fixes it.

Computers.

owainkenwayucl commented 11 months ago

I am re-running the install on all clusters (already fixed Myriad + Kathleen).

owainkenwayucl commented 11 months ago

Fixed on all clusters.