NVIDIA / gdrcopy

A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology
MIT License
898 stars 144 forks source link

Error when building tests: Undefined reference to symbol '__c_mset4' #303

Closed aproeme closed 3 months ago

aproeme commented 3 months ago

Hello,

I'm trying to build gdrcopy v2.4.1 with v550.54.15 of the GPU drivers and with NVHPC v24.5.

The driver itself - gdrdrv - appears to build successfully but after the build progresses to the tests I get the following error:

/work/y07/shared/cirrus-software/nvidia/hpcsdk-24.5/Linux_x86_64/24.5/cuda/12.4/bin/nvcc 
-o gdrcopy_pplat pplat.o common.o 
-L /work/y07/shared/cirrus-software/nvidia/hpcsdk-24.5/Linux_x86_64/24.5/cuda/12.4/lib64 
-L /work/y07/shared/cirrus-software/nvidia/hpcsdk-24.5/Linux_x86_64/24.5/cuda/12.4/lib 
-L /usr/lib64/nvidia 
-L /usr/lib/nvidia 
-L /work/y07/shared/cirrus-software/nvidia/hpcsdk-24.5/Linux_x86_64/24.5/cuda/12.4/lib64/stubs 
-L /work/y07/shared/cirrus-software/nvidia/hpcsdk-24.5/Linux_x86_64/24.5/cuda/12.4/lib64 
-L ../src 
-lgdrapi -lcuda

/usr/bin/ld: common.o: undefined reference to symbol '__c_mset4'
/work/y07/shared/cirrus-software/nvidia/hpcsdk-24.5/Linux_x86_64/24.5/compilers/lib/libnvc.so: error adding symbols: DSO missing from command line

common.o was built prior as follows:

/work/y07/shared/cirrus-software/nvidia/hpcsdk-24.5/Linux_x86_64/24.5/compilers/bin/nvc++ 
-O2 
-I /work/y07/shared/cirrus-software/nvidia/hpcsdk-24.5/Linux_x86_64/24.5/cuda/12.4/include 
-I ../include 
-I ../src 
-I /work/y07/shared/cirrus-software/nvidia/hpcsdk-24.5/Linux_x86_64/24.5/cuda/12.4/include  
-c -o common.o common.cpp

Confirmation that this symbol is undefined within common.o:

> nm -a common.o | grep c_mset4
                 U   __c_mset4

Can you advise on the likely cause of this and how to attempt to resolve?

Note I have tried manually editing src/Makefile to include the fix noted here: https://github.com/NVIDIA/gdrcopy/pull/284/commits/105b8afff220b7a3d39f0c0e2e37b4b284cfb0ac however this did not make a difference.

drossetti commented 3 months ago

@aproeme please note we have never validated gdrcopy with the HPC compiler.

In any case, you seem to be mixing compilers:

gcc is probably linking to the libstdc++. nvc++ may be linking with something else, I did not check.

IMO you should not be mixing toolchains.

aproeme commented 3 months ago

Thank you for the quick response, you were right, simply using gcc for everything worked.