Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

Target CUDA RTL --> CUDA error is: named symbol not found #44371

Open Quuxplusone opened 4 years ago

Quuxplusone commented 4 years ago
Bugzilla Link PR45401
Status REOPENED
Importance P normal
Reported by Ye Luo (xw111luoye@gmail.com)
Reported on 2020-04-02 06:54:26 -0700
Last modified on 2021-03-04 06:26:05 -0800
Version unspecified
Hardware PC Linux
CC bruno.turcksin@gmail.com, llvm-bugs@lists.llvm.org
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also
I got from LIBOMPTARGET_DEBUG=1 output

Target CUDA RTL --> Load data from image 0x000000000096ab50
Target CUDA RTL --> CUDA module successfully loaded!
Target CUDA RTL --> Loading
'__omp_offloading_29_37ac7ea__ZN11qmcplusplus12SplineC2ROMPIfE19evaluateVGLMultiPosERKNS_6VectorIfNS_12OMPallocatorIfNS_23CUDALockedPageAllocatorIfNS_10MallocatorIfLm64EEEEEEEEERKSt6vectorISt17reference_wrapperINS2_IfSaIfEEEESaISG_EERKSC_ISD_INS2_INS_10TinyVectorIfLj3EEESaISM_EEEESaISP_EESK__l103'
(Failed)
Target CUDA RTL --> CUDA error is: named symbol not found
Libomptarget --> Unable to generate entries table for device id 0.

caused by the following function in a class.
  void evaluateVGLMultiPos(const Vector<ST, OffloadPinnedAllocator<ST>>& multi_pos_copy,
                           const RefVector<ValueVector_t>& psi_v_list,
                           const RefVector<GradVector_t>& dpsi_v_list,
                           const RefVector<ValueVector_t>& d2psi_v_list)
  {
    #pragma omp target
    {    }
  }

The same class do have another target region which works fine if I comment out
the above target region.
  void finalizeConstruction() override
  {
    #pragma omp target
    {     }
  }

Here I was not using static linking but directly linking the object file into
the final executable.

I also wondered if the long mangled name was the issue. But adding many
characters to finalizeConstruction didn't seem to cause any issue when I tried.
Quuxplusone commented 4 years ago

When I built from the master today, the problem remains with cuda 10.1/10.2 but goes away with CUDA 11.

Quuxplusone commented 4 years ago
After a bit more investigation, I found it related with the flag -march
intended for the host compilation.
I invoke the compiler with
-march=native -O3 -ffast-math -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda
some functions failed to be loaded.
If I put -march=skylake-avx512, it still fails.
Once I changed the flag to -march=skylake, my application runs.

So my hypothesis is, some avx512 codes contaminated my device functions.
Quuxplusone commented 3 years ago
The issue was caused by the difference between host and device pass during
compilation.
See "background" description in https://github.com/QMCPACK/qmcpack/pull/2981