TinkerTools / tinker-hp

Tinker-HP: High-Performance Massively Parallel Evolution of Tinker on CPUs & GPUs
http://tinker-hp.org/
Other
80 stars 24 forks source link

Compilation Errors with Standard Math Functions using CUDA 12.2 and NVHPC 23.7 #19

Open zimb3l-priv opened 10 months ago

zimb3l-priv commented 10 months ago

Since the only available CUDA Versions on my system are 11.2, 12.1 and 12.2 I tried to install the GPU version with the newest available versions. If this is already an error or if this is rather a problem with NVIDIA HPC instead of Tinker; ignore the rest and inform me. Otherwise;

Description: When attempting to compile Tinker-HP with CUDA version 12.2 and NVIDIA HPC SDK version 23.7, compilation errors are encountered related to standard C++ math functions not being recognized within the std namespace.

Environment:

Operating System: Rocky Linux 8.7
CUDA Version: 12.2
NVIDIA HPC SDK Version: 23.7
GCC Version: 9.2.1
Compiler Used: nvfortran from NVIDIA HPC SDK

Steps to Reproduce:

Loaded the NVIDIA HPC SDK module for version 23.7.
Set the NVHPC_CUDA_HOME to point to /usr/local/cuda-12.2.
Updated the PATH and LD_LIBRARY_PATH accordingly to include paths for the HPC SDK and CUDA toolkit.
Edited install.sh to specify cuda version 12.2
Ran the Tinker-HP install.sh script to begin the compilation process.
Encountered errors regarding standard math functions not being found in the std namespace in the file /usr/local/cuda-12.2/include/crt/math_functions.h.

Expected Behavior: The compilation should recognize standard math functions from the C++ standard library and compile without errors.

Actual Behavior: The following errors are displayed during the compilation process:

/usr/local/cuda-12.2/include/crt/math_functions.h: error: namespace "std" has no member "cos"
using std::cos;

(Additional similar errors for other math functions like cosh, atan, atan2, tan, tanh, etc.)

Additional Information:

The localrc file within the HPC SDK was configured to set DEFCUDAVERSION=12.2.
Unsure about compatibility of GCC 9.2.1 with CUDA 12.2
The same math functions compile correctly in non-CUDA related C++ projects

I also created a file to source with a couple of paths to ensure Tinker-HP uses the right ones during compilations. Maybe someone sees an error here:

NVARCH=`uname -s`_`uname -m`; export NVARCH
NVCOMPILERS=/opt/nvidia/hpc_sdk; export NVCOMPILERS
MANPATH=$MANPATH:$NVCOMPILERS/$NVARCH/23.7/compilers/man; export MANPATH
PATH=$NVCOMPILERS/$NVARCH/23.7/comm_libs/bin:$NVCOMPILERS/$NVARCH/23.7/compilers/bin:$NVCOMPILERS/$NVARCH/23.7/cuda/bin:$PATH; export PATH
LD_LIBRARY_PATH=$NVCOMPILERS/$NVARCH/23.7/compilers/lib:$NVCOMPILERS/$NVARCH/23.7/comm_libs/lib:$LD_LIBRARY_PATH; export LD_LIBRARY_PATH
CPATH=$NVCOMPILERS/$NVARCH/23.7/compilers/include:$CPATH; export CPATH

Attempted Fixes:

Ensuring the environment variables are correctly set.
Checking for the correct paths and versions in the localrc file.
Looking for online solutions related to similar issues.

Request: Assistance is requested to resolve the compilation issues related to the standard C++ library functions in CUDA 12.2 headers when using the NVIDIA HPC SDK. Any known fixes, patches, or suggestions to bypass these errors would be greatly appreciated.

opadjoua commented 10 months ago

Hello, For the compilation error you mention above, you may try removing the NVHPC_CUDA_HOME variable, and let the compiler use its default value. Even if both standalone CUDA12.2 and the NVHPC version match, we cannot be sure the file structure is identical. As for the global problem, I suggest you used an NVHPC package under 22.7 to compile and run. A note mentioning the operational configurations was left inside the readme.md (Prerequisites). A runtime compiler bug prevents us from running, while using any version above. Luckily, the CUDA Driver is retro-compatible. You can even switch the package to 11.0 version before compiling. This will not impact the performances.

zimb3l-priv commented 10 months ago

Ok, I switched to HPC 22.7 and set the cuda-version in the install script to 11.7 as that's the one in my HPC folder also.

I also installed GCC 11.2 and set GNURUOOT=/opt/rh/gcc-toolset-11/root/usr/bin/ before executing the install script so my prerequisites are now HPC-SDK 22.7 + cuda11.7 + GNU-11.2.1 which is compliant with what's written in the recommended section.

Unfortunately the compilation still crashes with the following output:

.
.
.
mpif90 -cpp  -traceback -g -fast -Mdalign -Minline=maxsize:340 -r8 -cuda -gpu=cc60,cc70,cc80,cc86,cuda11.7,unroll -c nblistcu.f
mpif90 -cpp  -traceback -g -fast -Mdalign -Minline=maxsize:340 -r8 -cuda -gpu=cc60,cc70,cc80,cc86,cuda11.7,unroll -c pmestuffcu.f
mpif90 -cpp  -traceback -g -fast -Mdalign -Minline=maxsize:340 -r8 -cuda -gpu=cc60,cc70,cc80,cc86,cuda11.7,unroll -c tmatxb_pmecu.f
mpif90 -cpp  -traceback -g -fast -Mdalign -Minline=maxsize:340 -r8 -cuda -gpu=cc60,cc70,cc80,cc86,cuda11.7,unroll -c tmatxb_pme_cpen.cu.f
NVFORTRAN-F-0000-Internal compiler error. readin_func: too many ilms    2413  (echgtrncu.f: 87)
NVFORTRAN/x86-64 Linux 22.7-0: compilation aborted
make[1]: *** [Makefile:570: echgtrncu.o] Error 2
make[1]: *** Waiting for unfinished jobs....
make[1]: Leaving directory '/software/Tinker-HP/tinker-hp/GPU/build0'
make: *** [Makefile:480: libtinker] Error 2

             ------ WARNING ------
   Something went wrong during compilation procedure "
   Please Fix the issue and run ci/install.sh again"
             ---------------------

The same happened when I previously tried using GCC 9.2.1 (so HPC-SDK 22.7 + cuda11.7 + GNU-9.2.1) and that's why I installed 11.2.1 in the first place

Not sure what else I could try here... Maybe getting HPC 22.2?