QMCPACK / miniqmc

QMCPACK miniapp: a simplified real space QMC code for algorithm development, performance portability testing, and computer science experiments
Other
26 stars 34 forks source link

Known working combinations of gcc/cuda or llvm/cuda #265

Closed cabreraam closed 3 years ago

cabreraam commented 3 years ago

Hi all,

The config directory in the omp_offload branch has different CMake incantations for different systems. In a similar vein, I'm trying to build versions of gcc/llvm/cuda using spack that would allow me to build the omp_offload branch. Could anyone list combinations of gcc/llvm/cuda that work for them? I'm trying to target an Intel skylake Xeon and a P100 with CUDA version 11.2 reported by nvidia-smi. Any help is appreciated! If this is a vague question, I'm happy to try to help make it less vague.

Thanks Anthony

ye-luo commented 3 years ago

Most up-to-date recipes for OMP_offload branch is at https://github.com/QMCPACK/miniqmc/wiki/OpenMP-offload

GCC even the release 11 can only build some pieces of miniQMC with OpenMP offload turned on. LLVM 11 or 12 are recommended. In general, it is not picky on CUDA versions but you do need a host compiler working with the provided CUDA. For example, LLVM 11 is released before CUDA 11.1 and it doesn't work with CUDA >11.0

cabreraam commented 3 years ago

This is very helpful, thank you! I have looked through the wiki documentation before, but my apologies for not noticing the whole page on the OpenMP offload branch.

I'm building gcc 9.2.0 right now. I had 9.3.0 installed, and everything built except for test_omptarget_icpx_opencl_wrong_number.cpp I received the following error:

[ 39%] Building CXX object src/Platforms/tests/OMPTarget/CMakeFiles/test_omptarget_icpx_opencl_wrong_number.dir/test_omp_icpx_opencl_wrong_number.cpp.o
cd /home/anthony/Research/miniqmc/build_omp_gcc/src/Platforms/tests/OMPTarget && /home/anthony/spack/var/spack/environments/miniqmc/.spack-env/view/bin/g++ -DADD_ -DH5_USE_16_API -DHAVE_CONFIG_H -Drestrict=__restrict__ -I/home/anthony/Research/miniqmc/src -I/home/anthony/Research/miniqmc/build_omp_gcc/src -I/home/anthony/Research/miniqmc/src/Platforms -I/home/anthony/Research/miniqmc/external_codes/catch -g -fopenmp -foffload=nvptx-none -foffload=-lm -fno-lto -fomit-frame-pointer -finline-limit=1000 -fstrict-aliasing -funroll-all-loops -D__forceinline=inline -Wno-deprecated -march=native -O3 -DNDEBUG -ffast-math -std=c++11 -MD -MT src/Platforms/tests/OMPTarget/CMakeFiles/test_omptarget_icpx_opencl_wrong_number.dir/test_omp_icpx_opencl_wrong_number.cpp.o -MF CMakeFiles/test_omptarget_icpx_opencl_wrong_number.dir/test_omp_icpx_opencl_wrong_number.cpp.o.d -o CMakeFiles/test_omptarget_icpx_opencl_wrong_number.dir/test_omp_icpx_opencl_wrong_number.cpp.o -c /home/anthony/Research/miniqmc/src/Platforms/tests/OMPTarget/test_omp_icpx_opencl_wrong_number.cpp
/home/anthony/Research/miniqmc/src/Platforms/tests/OMPTarget/test_omp_icpx_opencl_wrong_number.cpp: In instantiation of 'void qmcplusplus::test_icpx_opencl_wrong_number() [with T = float]':
/home/anthony/Research/miniqmc/src/Platforms/tests/OMPTarget/test_omp_icpx_opencl_wrong_number.cpp:59:40:   required from here
/home/anthony/Research/miniqmc/src/Platforms/tests/OMPTarget/test_omp_icpx_opencl_wrong_number.cpp:32:29: error: 'y_ptr' appears more than once in data clauses
   32 | #pragma omp target data map(y_ptr[:1]) use_device_ptr(y_ptr)
      |                             ^~~~~
/home/anthony/Research/miniqmc/src/Platforms/tests/OMPTarget/test_omp_icpx_opencl_wrong_number.cpp: In instantiation of 'void qmcplusplus::test_icpx_opencl_wrong_number() [with T = double]':
/home/anthony/Research/miniqmc/src/Platforms/tests/OMPTarget/test_omp_icpx_opencl_wrong_number.cpp:61:41:   required from here
/home/anthony/Research/miniqmc/src/Platforms/tests/OMPTarget/test_omp_icpx_opencl_wrong_number.cpp:32:29: error: 'y_ptr' appears more than once in data clauses

I'll report back with builds for LLVM 12 and maybe 11, and then close the issue.

Thank you again!

cabreraam commented 3 years ago

9.2.0 also fails with that test, unfortunately. I removed the test from here so that I could run miniqmc. When I run miniqmc with nvprof though, I don't see any evidence of a d2h transfer. I set OMP_NUM_THREADS=1, and observe the following nvprof output:

==27533== Profiling application: ./miniqmc
==27533== Profiling result:
            Type  Time(%)      Time     Calls       Avg       Min       Max  Name
 GPU activities:  100.00%  20.454ms         2  10.227ms  2.1120us  20.452ms  [CUDA memcpy HtoD]
      API calls:   58.21%  189.15ms         1  189.15ms  189.15ms  189.15ms  cuCtxCreate
                   34.89%  113.39ms         1  113.39ms  113.39ms  113.39ms  cuCtxDestroy
                    6.35%  20.627ms         2  10.313ms  37.929us  20.589ms  cuMemcpyHtoD
                    0.30%  984.25us         3  328.08us  27.342us  732.25us  cuMemFree
                    0.17%  566.25us        13  43.557us  2.3420us  337.25us  cuMemAlloc
                    0.03%  98.747us         4  24.686us     414ns  95.808us  cuMemGetAddressRange
                    0.02%  69.589us        16  4.3490us     119ns  64.850us  cuDeviceGetAttribute
                    0.02%  57.611us        18  3.2000us     122ns  37.157us  cuCtxGetDevice
                    0.00%  9.7340us         1  9.7340us  9.7340us  9.7340us  cuDeviceGetPCIBusId
                    0.00%  5.0180us         3  1.6720us  1.0820us  2.4280us  cuCtxPushCurrent
                    0.00%  1.7080us         4     427ns     349ns     519ns  cuCtxGetCurrent
                    0.00%  1.2170us         4     304ns     118ns     751ns  cuDeviceGetCount
                    0.00%     801ns         1     801ns     801ns     801ns  cuInit

Should there be some kind of d2h transfer observed? And it doesn't seem that any computation gets executed on the GPU. I don't have much experience with GPUs or the tools, but I'm expecting to see some kind of d2h transfer and the some kind of kernel execution stats, but I see nothing. Do you have any insight to share @ye-luo?

ye-luo commented 3 years ago

My previous comment about GCC was incorrect. As reflected in my tracking table. GCC 11.1.0 works fine.

OMP_NUM_THREADS=2 nsys nvprof ./bin/miniqmc

CUDA Kernel Statistics:

 Time(%)  Total Time (ns)  Instances   Average   Minimum  Maximum                                         Name                                        
 -------  ---------------  ---------  ---------  -------  -------  -----------------------------------------------------------------------------------
    63.1    3,734,826,585      4,608  810,509.2  806,135  925,653  _ZN11qmcplusplus17einspline_spo_ompIdE12evaluate_vghERKNS_11ParticleSetEi$_omp_fn$0
    36.9    2,182,753,423     11,532  189,278.0  181,310  293,821  _ZN11qmcplusplus17einspline_spo_ompIdE10evaluate_vERKNS_11ParticleSetEi$_omp_fn$0  
     0.0            9,760          2    4,880.0    4,576    5,184  _ZN11qmcplusplus17einspline_spo_ompIdEC2ERKS1_ii$_omp_fn$0                         
     0.0            8,320          1    8,320.0    8,320    8,320  _ZN11qmcplusplus17einspline_spo_ompIdE3setEiiiiib$_omp_fn$0  
ye-luo commented 3 years ago

@cabreraam did you following the exact recipe. It seems missing -D ENABLE_OFFLOAD=1

cabreraam commented 3 years ago

@ye-luo Thanks for the reply. I have set that flag. I'm thinking I wouldn't get any nvprof output if I hadn't set it.

ye-luo commented 3 years ago

@ye-luo Thanks for the reply. I have set that flag. I'm thinking I wouldn't get any nvprof output if I hadn't set it.

Not 100% sure but libgomp may still initialize the GPU as long as offload feature is enabled.

cabreraam commented 3 years ago

This is the command I'm using:

[2:03 PM] Cabrera, Anthony

cmake \
    -DCMAKE_BUILD_TYPE=Release \
    -DENABLE_OFFLOAD=1 \
    -DQMC_MPI=0 \
    -DCMAKE_CXX_FLAGS="-g" \
    -DCMAKE_C_COMPILER=${​​​​​​​C_COMP}​​​​​​​ \
    -DCMAKE_CXX_COMPILER=${​​​​​​​CXX_COMP}​​​​​​​  ..
ye-luo commented 3 years ago

Check your GCC to see if it is built properly to compile any OpenMP offload program. Try https://github.com/ye-luo/openmp-target/blob/master/tests/private/target_teams_distribute_parallel_for_private.cpp

g++ -fopenmp target_teams_distribute_parallel_for_private.cpp
nsys nvprof ./a.out

and see if you get any kernel activity

cabreraam commented 3 years ago

Seems promising!

[miniqmc] 93u@pcie:~/Sandbox/ye-luo-omp-test $ nsys nvprof ./a.out
WARNING: a.out and any of its children processes will be profiled.

Collecting data...
host pointer = 0x7ffc282510cc
Processing events...
Saving temporary "/tmp/nsys-report-69ca-92b4-bba1-5f25.qdstrm" file to disk...
Creating final output files...

Processing [==============================================================100%]
Saved report file to "/tmp/nsys-report-69ca-92b4-bba1-5f25.qdrep"
Exporting 788 events: [===================================================100%]

Exported successfully to
/tmp/nsys-report-69ca-92b4-bba1-5f25.sqlite

Generating CUDA API Statistics...
CUDA API Statistics (nanoseconds)

Time(%)      Total Time       Calls         Average         Minimum         Maximum  Name                                                                            
-------  --------------  ----------  --------------  --------------  --------------  --------------------------------------------------------------------------------
   69.4       180806031           1     180806031.0       180806031       180806031  cuCtxCreate_v2                                                                  
   24.7        64331526           1      64331526.0        64331526        64331526  cuCtxDestroy_v2                                                                 
    4.7        12337651           1      12337651.0        12337651        12337651  cuModuleLoadData                                                                
    0.6         1588315           1       1588315.0         1588315         1588315  cuLaunchKernel                                                                  
    0.3          877924           1        877924.0          877924          877924  cuLinkComplete                                                                  
    0.1          146431           1        146431.0          146431          146431  cuCtxSynchronize                                                                
    0.0          112469           3         37489.7            4306          103532  cuMemAlloc_v2                                                                   
    0.0           94018           3         31339.3            2500           85767  cuMemFree_v2                                                                    
    0.0           45783           3         15261.0           12084           20769  cuMemcpyDtoH_v2                                                                 
    0.0           32983           1         32983.0           32983           32983  cuLinkCreate_v2                                                                 
    0.0           10888           1         10888.0           10888           10888  cuMemcpyHtoD_v2                                                                 
    0.0            1655           1          1655.0            1655            1655  cuLinkDestroy                                                                   
    0.0             597           1           597.0             597             597  cuInit                                                                          

Generating CUDA Kernel Statistics...
CUDA Kernel Statistics (nanoseconds)

Time(%)      Total Time   Instances         Average         Minimum         Maximum  Name                                                                                                                                                                                                                                                                                                                                         
-------  --------------  ----------  --------------  --------------  --------------  --------------------------------------------------------------------------------------------------------------------                                                                                                                                                                                                                         
  100.0          140447           1        140447.0          140447          140447  main$_omp_fn$0                                                                                                                                                                                                                                                                                                                               

Generating CUDA Memory Operation Statistics...
CUDA Memory Operation Statistics (nanoseconds)

Time(%)      Total Time  Operations         Average         Minimum         Maximum  Name                                                                            
-------  --------------  ----------  --------------  --------------  --------------  --------------------------------------------------------------------------------
   71.4            5664           3          1888.0            1696            2144  [CUDA memcpy DtoH]                                                              
   28.6            2272           1          2272.0            2272            2272  [CUDA memcpy HtoD]                                                              

CUDA Memory Operation Statistics (KiB)

              Total      Operations              Average            Minimum              Maximum  Name                                                                            
-------------------  --------------  -------------------  -----------------  -------------------  --------------------------------------------------------------------------------
              0.141               3                0.047              0.035                0.070  [CUDA memcpy DtoH]                                                              
              0.023               1                0.023              0.023                0.023  [CUDA memcpy HtoD]                                                              

Generating NVTX Push-Pop Range Statistics...
NVTX Push-Pop Range Statistics (nanoseconds)

Report file moved to "/home/93u/Sandbox/ye-luo-omp-test/report1.qdrep"
Report file moved to "/home/93u/Sandbox/ye-luo-omp-test/report1.sqlite"
cabreraam commented 3 years ago

Here's the output with GCC 9.3.0:

OMP_NUM_THREADS=4 nsys nvprof ./bin/miniqmc
WARNING: miniqmc and any of its children processes will be profiled.

Collecting data...
miniqmc git branch: OMP_offload
miniqmc git commit: b28732943e8ea9c94f2026d39007c27cb78e5451-dirty

Number of orbitals/splines = 192
Tile size = 192
Number of tiles = 1
Number of electrons = 384
Rmax = 1.7
AcceptanceRatio = 0.5
Iterations = 5
OpenMP threads = 4
Number of walkers per rank = 4

SPO coefficients size = 98304000 bytes (93.75 MB)
delayed update rank = 32
YYYY offload size = 12288000
YYYY end of spline offloading
Using SoA distance table, Jastrow + einspline, 
and determinant update.
================================== 
Stack timer profile in seconds
Timer                             Inclusive_time  Exclusive_time  Calls       Time_per_call
Setup                                0.2516     0.2516              1       0.251574685
Total                               14.7604     1.5026              1      14.760427507
  Diffusion                          3.1733     0.0034              5       0.634655442
    Accept move                      0.0010     0.0010            932       0.000001032
    Complete Updates                 0.0013     0.0000              5       0.000261684
      Determinant::update            0.0013     0.0013             10       0.000129892
    Current Gradient                 0.0330     0.0013           1920       0.000017166
      Determinant::ratio             0.0311     0.0311           1920       0.000016209
      OneBodyJastrow                 0.0003     0.0003           1920       0.000000168
      TwoBodyJastrow                 0.0002     0.0002           1920       0.000000126
    Kinetic Energy                   0.0026     0.0025              5       0.000516229
      OneBodyJastrow                 0.0000     0.0000              5       0.000002695
      TwoBodyJastrow                 0.0000     0.0000              5       0.000004066
    Make move                        0.0114     0.0114           1920       0.000005920
    New Gradient                     3.0490     0.0017           1920       0.001588011
      Determinant::ratio             0.0007     0.0007           1920       0.000000349
      Determinant::spovgl            3.0315     0.0049           1920       0.001578932
        Single-Particle Orbitals     3.0267     3.0267           1920       0.001576393
      OneBodyJastrow                 0.0019     0.0019           1920       0.000000972
      TwoBodyJastrow                 0.0132     0.0132           1920       0.000006866
    Set active                       0.0359     0.0359           1920       0.000018701
    Update                           0.0359     0.0010            932       0.000038470
      Determinant::update            0.0251     0.0251            932       0.000026957
      OneBodyJastrow                 0.0002     0.0002            932       0.000000167
      TwoBodyJastrow                 0.0096     0.0096            932       0.000010265
  Initialization                     1.0668     0.0904              1       1.066813463
    Determinant::inverse             0.0046     0.0046              2       0.002313270
    Determinant::spovgl              0.9693     0.0025              2       0.484668310
      Single-Particle Orbitals       0.9669     0.9669            384       0.002517860
    OneBodyJastrow                   0.0003     0.0003              1       0.000293234
    TwoBodyJastrow                   0.0021     0.0021              1       0.002140778
  Pseudopotential                    9.0177     0.0020              5       1.803537464
    Make move                        0.0431     0.0431           7968       0.000005411
    Value                            8.9725     0.0039           7968       0.001126072
      Determinant::ratio             0.0013     0.0013           7968       0.000000163
      Determinant::spoval            8.9421     0.0044           7968       0.001122256
        Single-Particle Orbitals     8.9378     8.9378           7968       0.001121706
      OneBodyJastrow                 0.0038     0.0038           7968       0.000000473
      TwoBodyJastrow                 0.0214     0.0214           7968       0.000002688

========== Throughput ============ 

Total throughput ( N_walkers * N_elec^3 / Total time ) = 1.53446e+07
Diffusion throughput ( N_walkers * N_elec^3 / Diffusion time ) = 7.13749e+07
Pseudopotential throughput ( N_walkers * N_elec^2 / Pseudopotential time ) = 65407.5

Processing events...
Saving temporary "/tmp/nsys-report-47e9-b484-7946-91bc.qdstrm" file to disk...
Creating final output files...

Processing [==============================================================100%]
Saved report file to "/tmp/nsys-report-47e9-b484-7946-91bc.qdrep"
Exporting 730 events: [===================================================100%]

Exported successfully to
/tmp/nsys-report-47e9-b484-7946-91bc.sqlite

Generating CUDA API Statistics...
CUDA API Statistics (nanoseconds)

Time(%)      Total Time       Calls         Average         Minimum         Maximum  Name                                                                            
-------  --------------  ----------  --------------  --------------  --------------  --------------------------------------------------------------------------------
   65.5       182434692           1     182434692.0       182434692       182434692  cuCtxCreate_v2                                                                  
   26.7        74324859           1      74324859.0        74324859        74324859  cuCtxDestroy_v2                                                                 
    7.6        21085909           2      10542954.5           34361        21051548  cuMemcpyHtoD_v2                                                                 
    0.2          553237          13         42556.7            2296          323325  cuMemAlloc_v2                                                                   
    0.0          109599           3         36533.0            5110           77875  cuMemFree_v2                                                                    
    0.0             592           1           592.0             592             592  cuInit                                                                          

Generating CUDA Memory Operation Statistics...
CUDA Memory Operation Statistics (nanoseconds)

Time(%)      Total Time  Operations         Average         Minimum         Maximum  Name                                                                            
-------  --------------  ----------  --------------  --------------  --------------  --------------------------------------------------------------------------------
  100.0        20904710           2      10452355.0            2273        20902437  [CUDA memcpy HtoD]                                                              

CUDA Memory Operation Statistics (KiB)

              Total      Operations              Average            Minimum              Maximum  Name                                                                            
-------------------  --------------  -------------------  -----------------  -------------------  --------------------------------------------------------------------------------
          96000.242               2            48000.121              0.242            96000.000  [CUDA memcpy HtoD]                                                              

Generating NVTX Push-Pop Range Statistics...
NVTX Push-Pop Range Statistics (nanoseconds)
ye-luo commented 3 years ago

Paste ldd ./bin/miniqmc here and see if libgomp is from your gcc, OS or maybe llvm.

cabreraam commented 3 years ago
[miniqmc] 93u@pcie:~/Research/miniqmc_aaron/build_omp_gcc (OMP_offload)$ ldd ./bin/miniqmc
    linux-vdso.so.1 =>  (0x00007ffd2196d000)
    libopenblas.so.0 => /home/93u/spack/var/spack/environments/miniqmc/.spack-env/view/lib/libopenblas.so.0 (0x00007fccf408f000)
    libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fccf3e73000)
    libdl.so.2 => /lib64/libdl.so.2 (0x00007fccf3c6f000)
    libstdc++.so.6 => /home/93u/spack/opt/spack/linux-centos7-skylake_avx512/gcc-10.1.0/gcc-9.2.0-2kaygxbubgsysp4r3j4hul5shwhnlqct/lib64/libstdc++.so.6 (0x00007fccf3895000)
    libm.so.6 => /lib64/libm.so.6 (0x00007fccf3593000)
    libgomp.so.1 => /home/93u/spack/opt/spack/linux-centos7-skylake_avx512/gcc-10.1.0/gcc-9.2.0-2kaygxbubgsysp4r3j4hul5shwhnlqct/lib64/libgomp.so.1 (0x00007fccf335c000)
    libgcc_s.so.1 => /home/93u/spack/opt/spack/linux-centos7-skylake_avx512/gcc-10.1.0/gcc-9.2.0-2kaygxbubgsysp4r3j4hul5shwhnlqct/lib64/libgcc_s.so.1 (0x00007fccf3144000)
    libc.so.6 => /lib64/libc.so.6 (0x00007fccf2d76000)
    libgfortran.so.5 => /home/93u/spack/opt/spack/linux-centos7-haswell/gcc-4.8.5/gcc-10.1.0-qxwqoeodm6s6fro2qeatltppa6hd3of6/lib64/libgfortran.so.5 (0x00007fccf28c2000)
    /lib64/ld-linux-x86-64.so.2 (0x00007fccf4f80000)
    libquadmath.so.0 => /home/93u/spack/opt/spack/linux-centos7-haswell/gcc-4.8.5/gcc-10.1.0-qxwqoeodm6s6fro2qeatltppa6hd3of6/lib64/libquadmath.so.0 (0x00007fccf267b000)
ye-luo commented 3 years ago

I ran out of ideas. You are probably more familiar with your settings than I am.

  1. If gcc has some options to let the runtime printout stuff, there is a chance to found out why.
  2. Check if ​​​​​​​CXX_COMP is pointing to the desired one.
  3. Does spack built gcc/9.2.0 actually has offload activated.
  4. Try the lastest OMP_offload branch.
ye-luo commented 3 years ago

I spack installed gcc@9.3.0 +cuda and it fails to build the miniqmc executable. In my recollection, gcc 9/10 cannot build the main executables (check_spo_batched/miniqmc) of miniQMC when offload is enabled and 11 is the first working version. So something is not right on your side.

prckent commented 3 years ago

I think for anything offload related it doesn't make sense to use anything but either the latest release or the development version of a particular compiler. There have been a huge number of fixes and improvements. More are still needed, so tracking new versions is an ongoing task.

cabreraam commented 3 years ago

I spack installed gcc@9.3.0 +cuda and it fails to build the miniqmc executable. In my recollection, gcc 9/10 cannot build the main executables (check_spo_batched/miniqmc) of miniQMC when offload is enabled and 11 is the first working version. So something is not right on your side.

Yes, you are right. I didn't have the most recent version of the OMP_offload branch, and GCC9 did not build on my end either, unfortunately.

@prckent This is helpful! Based on that, should I only be trying GCC 11 and LLVM 12?

All, I will be hammering away at this more today, and I'll be sure to update this thread.

cabreraam commented 3 years ago

Okay, I'm trying to build with LLVM 12 now with the most recent version of the OMP_offload branch. Here is the output of concretizing my current spack environment:

[miniqmc] 93u@pcie:~/Research/miniqmc (OMP_offload)$ spack concretize -f
==> Concretized ncurses abi=5
[+]  zegfljh  ncurses@6.2%gcc@9.3.0~symlinks+termlib abi=5 arch=linux-centos7-skylake_avx512
[+]  awlhzrt      ^pkgconf@1.7.4%gcc@9.3.0 arch=linux-centos7-skylake_avx512

==> Concretized openblas threads=openmp ^ncurses abi=5
[+]  woeaw5j  openblas@0.3.15%gcc@9.3.0~bignuma~consistent_fpcsr~ilp64+locking+pic+shared threads=openmp arch=linux-centos7-skylake_avx512
[+]  lvbnoil      ^perl@5.32.1%gcc@9.3.0+cpanm+shared+threads arch=linux-centos7-skylake_avx512
[+]  ux333o2          ^berkeley-db@18.1.40%gcc@9.3.0+cxx~docs+stl patches=b231fcc4d5cff05e5c3a4814f6a5af0e9a966428dc2176540d2c05aff41de522 arch=linux-centos7-skylake_avx512
[+]  zegz5f2          ^gdbm@1.19%gcc@9.3.0 arch=linux-centos7-skylake_avx512
[+]  ljmlnik              ^readline@8.1%gcc@9.3.0 arch=linux-centos7-skylake_avx512
[+]  zegfljh                  ^ncurses@6.2%gcc@9.3.0~symlinks+termlib abi=5 arch=linux-centos7-skylake_avx512
[+]  awlhzrt                      ^pkgconf@1.7.4%gcc@9.3.0 arch=linux-centos7-skylake_avx512

==> Concretized cmake ^ncurses abi=5
[+]  vht5qng  cmake@3.20.2%gcc@9.3.0~doc+ncurses+openssl+ownlibs~qt build_type=Release arch=linux-centos7-skylake_avx512
[+]  zegfljh      ^ncurses@6.2%gcc@9.3.0~symlinks+termlib abi=5 arch=linux-centos7-skylake_avx512
[+]  awlhzrt          ^pkgconf@1.7.4%gcc@9.3.0 arch=linux-centos7-skylake_avx512
[+]  szbth3d      ^openssl@1.1.1k%gcc@9.3.0~docs+systemcerts arch=linux-centos7-skylake_avx512
[+]  lvbnoil          ^perl@5.32.1%gcc@9.3.0+cpanm+shared+threads arch=linux-centos7-skylake_avx512
[+]  ux333o2              ^berkeley-db@18.1.40%gcc@9.3.0+cxx~docs+stl patches=b231fcc4d5cff05e5c3a4814f6a5af0e9a966428dc2176540d2c05aff41de522 arch=linux-centos7-skylake_avx512
[+]  zegz5f2              ^gdbm@1.19%gcc@9.3.0 arch=linux-centos7-skylake_avx512
[+]  ljmlnik                  ^readline@8.1%gcc@9.3.0 arch=linux-centos7-skylake_avx512
[+]  tznwo4l          ^zlib@1.2.11%gcc@9.3.0+optimize+pic+shared arch=linux-centos7-skylake_avx512

==> Concretized llvm@12.0.0~all_targets+clang~code_signing+compiler-rt+cuda~flang+gold+internal_unwind~ipo+libcxx+lld+lldb~llvm_dylib+mlir+omp_debug~omp_tsan+polly~python~shared_libs~split_dwarf build_type=Release cuda_arch=60 ^cuda@11.0.2%gcc@9.3.0+dev ^ncurses abi=5
[+]  3m25tfb  llvm@12.0.0%gcc@9.3.0~all_targets+clang~code_signing+compiler-rt+cuda~flang+gold+internal_unwind~ipo+libcxx+lld+lldb~llvm_dylib+mlir+omp_debug~omp_tsan+polly~python~shared_libs~split_dwarf build_type=Release cuda_arch=60 arch=linux-centos7-skylake_avx512
[+]  rdseev6      ^binutils@2.36.1%gcc@9.3.0~gas+gold~headers~interwork+ld~libiberty~lto+nls+plugins libs=shared,static arch=linux-centos7-skylake_avx512
[+]  sfnz5ur          ^diffutils@3.7%gcc@9.3.0 arch=linux-centos7-skylake_avx512
[+]  uxrlfez              ^libiconv@1.16%gcc@9.3.0 arch=linux-centos7-skylake_avx512
[+]  whnh4xp          ^gettext@0.21%gcc@9.3.0+bzip2+curses+git~libunistring+libxml2+tar+xz arch=linux-centos7-skylake_avx512
[+]  nfhxtx6              ^bzip2@1.0.8%gcc@9.3.0~debug~pic+shared arch=linux-centos7-skylake_avx512
[+]  2phhvqm              ^libxml2@2.9.10%gcc@9.3.0~python arch=linux-centos7-skylake_avx512
[+]  awlhzrt                  ^pkgconf@1.7.4%gcc@9.3.0 arch=linux-centos7-skylake_avx512
[+]  szm3fli                  ^xz@5.2.5%gcc@9.3.0~pic libs=shared,static arch=linux-centos7-skylake_avx512
[+]  tznwo4l                  ^zlib@1.2.11%gcc@9.3.0+optimize+pic+shared arch=linux-centos7-skylake_avx512
[+]  zegfljh              ^ncurses@6.2%gcc@9.3.0~symlinks+termlib abi=5 arch=linux-centos7-skylake_avx512
[+]  hnysusn              ^tar@1.34%gcc@9.3.0 arch=linux-centos7-skylake_avx512
[+]  vht5qng      ^cmake@3.20.2%gcc@9.3.0~doc+ncurses+openssl+ownlibs~qt build_type=Release arch=linux-centos7-skylake_avx512
[+]  szbth3d          ^openssl@1.1.1k%gcc@9.3.0~docs+systemcerts arch=linux-centos7-skylake_avx512
[+]  lvbnoil              ^perl@5.32.1%gcc@9.3.0+cpanm+shared+threads arch=linux-centos7-skylake_avx512
[+]  ux333o2                  ^berkeley-db@18.1.40%gcc@9.3.0+cxx~docs+stl patches=b231fcc4d5cff05e5c3a4814f6a5af0e9a966428dc2176540d2c05aff41de522 arch=linux-centos7-skylake_avx512
[+]  zegz5f2                  ^gdbm@1.19%gcc@9.3.0 arch=linux-centos7-skylake_avx512
[+]  ljmlnik                      ^readline@8.1%gcc@9.3.0 arch=linux-centos7-skylake_avx512
[+]  wytow2q      ^cuda@11.0.2%gcc@9.3.0+dev arch=linux-centos7-skylake_avx512
[+]  3zhlqz3      ^hwloc@2.4.1%gcc@9.3.0~cairo~cuda~gl~libudev+libxml2~netloc~nvml+pci+shared arch=linux-centos7-skylake_avx512
[+]  jvl2ry7          ^libpciaccess@0.16%gcc@9.3.0 arch=linux-centos7-skylake_avx512
[+]  6yo3gpz              ^libtool@2.4.6%gcc@9.3.0 arch=linux-centos7-skylake_avx512
[+]  xz2gqcm                  ^m4@1.4.18%gcc@9.3.0+sigsegv patches=3877ab548f88597ab2327a2230ee048d2d07ace1062efe81fc92e91b7f39cd00,fc9b61654a3ba1a8d6cd78ce087e7c96366c290bc8d2c299f09828d793b853c8 arch=linux-centos7-skylake_avx512
[+]  jvcbyia                      ^libsigsegv@2.13%gcc@9.3.0 arch=linux-centos7-skylake_avx512
[+]  x5ehchy              ^util-macros@1.19.1%gcc@9.3.0 arch=linux-centos7-skylake_avx512
[+]  q5qkxrz      ^libedit@3.1-20210216%gcc@9.3.0 arch=linux-centos7-skylake_avx512
[+]  bxd2niq      ^libelf@0.8.13%gcc@9.3.0 arch=linux-centos7-skylake_avx512
[+]  svyy736      ^libffi@3.3%gcc@9.3.0 patches=26f26c6f29a7ce9bf370ad3ab2610f99365b4bdd7b82e7c31df41a3370d685c0 arch=linux-centos7-skylake_avx512
 -   cekazvo      ^perl-data-dumper@2.173%gcc@9.3.0 arch=linux-centos7-skylake_avx512
[+]  vnzwmlr      ^python@3.8.10%gcc@9.3.0+bz2+ctypes+dbm~debug+libxml2+lzma~nis~optimizations+pic+pyexpat+pythoncmd+readline+shared+sqlite3+ssl~tix~tkinter~ucs4+uuid+zlib patches=0d98e93189bc278fbc37a50ed7f183bd8aaf249a8e1670a465f0db6bb4f8cf87 arch=linux-centos7-skylake_avx512
[+]  5trs62h          ^expat@2.3.0%gcc@9.3.0+libbsd arch=linux-centos7-skylake_avx512
[+]  6m7qx2l              ^libbsd@0.11.3%gcc@9.3.0 arch=linux-centos7-skylake_avx512
[+]  pbk75ks                  ^libmd@1.0.3%gcc@9.3.0 arch=linux-centos7-skylake_avx512
[+]  66q2med          ^sqlite@3.35.5%gcc@9.3.0+column_metadata+fts~functions~rtree arch=linux-centos7-skylake_avx512
[+]  b7o6grf          ^util-linux-uuid@2.36.2%gcc@9.3.0 arch=linux-centos7-skylake_avx512
[+]  ivwojuz      ^swig@4.0.2%gcc@9.3.0 arch=linux-centos7-skylake_avx512
[+]  tnyx3ob          ^pcre@8.44%gcc@9.3.0~jit+multibyte+utf arch=linux-centos7-skylake_avx512
[+]  nh3jh56      ^z3@4.8.9%gcc@9.3.0+python arch=linux-centos7-skylake_avx512
[+]  e5l3pqd          ^py-setuptools@50.3.2%gcc@9.3.0 arch=linux-centos7-skylake_avx512

==> Concretized gcc@11.1.0+nvptx~piclibs~strip ^cuda@11.0.2%gcc@9.3.0+dev ^ncurses abi=5
[+]  ss6qnio  gcc@11.1.0%gcc@9.3.0~binutils~bootstrap~graphite+nvptx~piclibs~strip languages=c,c++,fortran arch=linux-centos7-skylake_avx512
[+]  wytow2q      ^cuda@11.0.2%gcc@9.3.0+dev arch=linux-centos7-skylake_avx512
[+]  2phhvqm          ^libxml2@2.9.10%gcc@9.3.0~python arch=linux-centos7-skylake_avx512
[+]  uxrlfez              ^libiconv@1.16%gcc@9.3.0 arch=linux-centos7-skylake_avx512
[+]  awlhzrt              ^pkgconf@1.7.4%gcc@9.3.0 arch=linux-centos7-skylake_avx512
[+]  szm3fli              ^xz@5.2.5%gcc@9.3.0~pic libs=shared,static arch=linux-centos7-skylake_avx512
[+]  tznwo4l              ^zlib@1.2.11%gcc@9.3.0+optimize+pic+shared arch=linux-centos7-skylake_avx512
[+]  zegfljh          ^ncurses@6.2%gcc@9.3.0~symlinks+termlib abi=5 arch=linux-centos7-skylake_avx512
[+]  sfnz5ur      ^diffutils@3.7%gcc@9.3.0 arch=linux-centos7-skylake_avx512
[+]  z2qvd6n      ^gmp@6.2.1%gcc@9.3.0 arch=linux-centos7-skylake_avx512
 -   t3qblvx          ^autoconf@2.69%gcc@9.3.0 arch=linux-centos7-skylake_avx512
[+]  xz2gqcm              ^m4@1.4.18%gcc@9.3.0+sigsegv patches=3877ab548f88597ab2327a2230ee048d2d07ace1062efe81fc92e91b7f39cd00,fc9b61654a3ba1a8d6cd78ce087e7c96366c290bc8d2c299f09828d793b853c8 arch=linux-centos7-skylake_avx512
[+]  jvcbyia                  ^libsigsegv@2.13%gcc@9.3.0 arch=linux-centos7-skylake_avx512
[+]  lvbnoil              ^perl@5.32.1%gcc@9.3.0+cpanm+shared+threads arch=linux-centos7-skylake_avx512
[+]  ux333o2                  ^berkeley-db@18.1.40%gcc@9.3.0+cxx~docs+stl patches=b231fcc4d5cff05e5c3a4814f6a5af0e9a966428dc2176540d2c05aff41de522 arch=linux-centos7-skylake_avx512
[+]  zegz5f2                  ^gdbm@1.19%gcc@9.3.0 arch=linux-centos7-skylake_avx512
[+]  ljmlnik                      ^readline@8.1%gcc@9.3.0 arch=linux-centos7-skylake_avx512
 -   xkusrhr          ^automake@1.16.3%gcc@9.3.0 arch=linux-centos7-skylake_avx512
[+]  6yo3gpz          ^libtool@2.4.6%gcc@9.3.0 arch=linux-centos7-skylake_avx512
[+]  dme3oau      ^mpc@1.1.0%gcc@9.3.0 arch=linux-centos7-skylake_avx512
[+]  ydb4rlw          ^mpfr@4.1.0%gcc@9.3.0 arch=linux-centos7-skylake_avx512
[+]  awzl5tl              ^autoconf-archive@2019.01.06%gcc@9.3.0 arch=linux-centos7-skylake_avx512
 -   jyxjrtp              ^texinfo@6.5%gcc@9.3.0 patches=12f6edb0c6b270b8c8dba2ce17998c580db01182d871ee32b7b6e4129bd1d23a,1732115f651cff98989cb0215d8f64da5e0f7911ebf0c13b064920f088f2ffe1 arch=linux-centos7-skylake_avx512
[+]  5oukl56      ^zstd@1.5.0%gcc@9.3.0~ipo~legacy~lz4~lzma~multithread+programs+shared+static~zlib build_type=RelWithDebInfo arch=linux-centos7-skylake_avx512
[+]  vht5qng          ^cmake@3.20.2%gcc@9.3.0~doc+ncurses+openssl+ownlibs~qt build_type=Release arch=linux-centos7-skylake_avx512
[+]  szbth3d              ^openssl@1.1.1k%gcc@9.3.0~docs+systemcerts arch=linux-centos7-skylake_avx512

==> Concretized cuda@11.0.2+dev ^ncurses abi=5
[+]  wytow2q  cuda@11.0.2%gcc@9.3.0+dev arch=linux-centos7-skylake_avx512
[+]  2phhvqm      ^libxml2@2.9.10%gcc@9.3.0~python arch=linux-centos7-skylake_avx512
[+]  uxrlfez          ^libiconv@1.16%gcc@9.3.0 arch=linux-centos7-skylake_avx512
[+]  awlhzrt          ^pkgconf@1.7.4%gcc@9.3.0 arch=linux-centos7-skylake_avx512
[+]  szm3fli          ^xz@5.2.5%gcc@9.3.0~pic libs=shared,static arch=linux-centos7-skylake_avx512
[+]  tznwo4l          ^zlib@1.2.11%gcc@9.3.0+optimize+pic+shared arch=linux-centos7-skylake_avx512
[+]  zegfljh      ^ncurses@6.2%gcc@9.3.0~symlinks+termlib abi=5 arch=linux-centos7-skylake_avx512

I'm using this script to build my the project:

C_COMP=clang
echo $C_COMP

CXX_COMP="clang++"
echo $CXX_COMP

mkdir -p build_omp_llvm && cd build_omp_llvm

cmake \
    -DCMAKE_BUILD_TYPE=Release \
    -DENABLE_OFFLOAD=1 \
    -DQMC_MPI=0 \
    -DCMAKE_CXX_FLAGS=-g \
    -DCMAKE_C_COMPILER=${C_COMP} \
    -DCMAKE_CXX_COMPILER=${CXX_COMP}  ..

if [[ $? -eq 0 ]]
then
    make -j16 VERBOSE=1
else
    echo "cmake failed so we won't build!"
fi

The project actually builds successfully, but I get a few warnings that look like this:

cd /home/93u/Research/miniqmc/build_omp_llvm/src && /home/93u/spack/var/spack/environments/miniqmc/.spack-env/view/bin/clang++ -DADD_ -DH5_USE_16_API -DHAVE_CONFIG_H -Drestrict=__restrict__ -I/home/93u/Research/miniqmc/src -I/home/93u/Research/miniqmc/build_omp_llvm/src -I/home/93u/Research/miniqmc/src/Platforms -g -fopenmp -Wall -Wno-unused-variable -Wno-overloaded-virtual -Wno-unused-private-field -Wno-unused-local-typedef -Wvla -Wno-unknown-pragmas -Wmisleading-indentation -fomit-frame-pointer -fstrict-aliasing -D__forceinline=inline -march=native -fopenmp-targets=nvptx64-nvidia-cuda -Wno-unknown-cuda-version -O3 -DNDEBUG -ffast-math -std=c++14 -MD -MT src/CMakeFiles/qmcbase.dir/Particle/ParticleSet_builder.cpp.o -MF CMakeFiles/qmcbase.dir/Particle/ParticleSet_builder.cpp.o.d -o CMakeFiles/qmcbase.dir/Particle/ParticleSet_builder.cpp.o -c /home/93u/Research/miniqmc/src/Particle/ParticleSet_builder.cpp
warning: src/Particle/Lattice/ParticleBConds.h:183:5: loop not vectorized: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]

However, this is the error I'm running into now:

[miniqmc] 93u@pcie:~/Research/miniqmc (OMP_offload)$ ./build_omp_llvm/bin/miniqmc
miniqmc git branch: OMP_offload
miniqmc git commit: 8c6cc1981d475e6e567585dc10ce8899797112ca

Number of orbitals/splines = 192
Tile size = 192
Number of tiles = 1
Number of electrons = 384
Rmax = 1.7
AcceptanceRatio = 0.5
Iterations = 5
OpenMP threads = 32
Number of walkers per rank = 32

SPO coefficients size = 98304000 bytes (93.75 MB)
delayed update rank = 32
CUDA error: Loading '__omp_offloading_2b_242462b__ZN11qmcplusplus17einspline_spo_ompIdEC1ERKS1_ii_l51' Failed
CUDA error: named symbol not found 
Libomptarget error: Unable to generate entries table for device id 0.
Libomptarget error: Failed to init globals on device 0
Libomptarget error: Run with LIBOMPTARGET_DEBUG=4 to dump host-target pointer mappings.
OMPallocator.hpp:65:5: Libomptarget fatal error 1: failure of target construct while offloading is mandatory
Aborted

At first glance, I'm thinking this might be a dynamic linking issue. Issuing the ldd command reveals:

[miniqmc] 93u@pcie:~/Research/miniqmc (OMP_offload)$ ldd -v ./build_omp_llvm/bin/miniqmc
    linux-vdso.so.1 =>  (0x00007fffb13da000)
    libopenblas.so.0 => /home/93u/spack/var/spack/environments/miniqmc/.spack-env/view/lib/libopenblas.so.0 (0x00007f40eb79d000)
    libm.so.6 => /lib64/libm.so.6 (0x00007f40eb49b000)
    libdl.so.2 => /lib64/libdl.so.2 (0x00007f40eb297000)
    libstdc++.so.6 => /home/93u/spack/var/spack/environments/miniqmc/.spack-env/._view/pfhb5mav3qarw35f2su54ms3cvu4m4sz/lib64/libstdc++.so.6 (0x00007f40eae90000)
    libomp.so => /home/93u/spack/var/spack/environments/miniqmc/.spack-env/view/lib/libomp.so (0x00007f40ec7b8000)
    libomptarget.so => /home/93u/spack/var/spack/environments/miniqmc/.spack-env/view/lib/libomptarget.so (0x00007f40ec791000)
    libgcc_s.so.1 => /home/93u/spack/var/spack/environments/miniqmc/.spack-env/._view/pfhb5mav3qarw35f2su54ms3cvu4m4sz/lib64/libgcc_s.so.1 (0x00007f40eac77000)
    libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f40eaa5b000)
    libc.so.6 => /lib64/libc.so.6 (0x00007f40ea68d000)
    libgfortran.so.5 => /home/93u/spack/opt/spack/linux-centos7-skylake_avx512/gcc-8.1.0/gcc-9.3.0-rd6ojgt5sp2q5yofly44m2b3oftdt5cl/lib64/libgfortran.so.5 (0x00007f40ea1fd000)
    libgomp.so.1 => /home/93u/spack/opt/spack/linux-centos7-skylake_avx512/gcc-8.1.0/gcc-9.3.0-rd6ojgt5sp2q5yofly44m2b3oftdt5cl/lib64/libgomp.so.1 (0x00007f40e9fc6000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f40ec694000)
    libhwloc.so.15 => /home/93u/spack/opt/spack/linux-centos7-skylake_avx512/gcc-9.3.0/hwloc-2.4.1-3zhlqz36x2k6l27bjxm4vphe7p3kctk7/lib/libhwloc.so.15 (0x00007f40e9d6c000)
    librt.so.1 => /lib64/librt.so.1 (0x00007f40e9b64000)
    libquadmath.so.0 => /home/93u/spack/opt/spack/linux-centos7-skylake_avx512/gcc-8.1.0/gcc-9.3.0-rd6ojgt5sp2q5yofly44m2b3oftdt5cl/lib64/libquadmath.so.0 (0x00007f40e991d000)
    libpciaccess.so.0 => /home/93u/spack/opt/spack/linux-centos7-skylake_avx512/gcc-9.3.0/libpciaccess-0.16-jvl2ry7g6ubjh3umqcji34i5cpdzppn4/lib/libpciaccess.so.0 (0x00007f40e9714000)
    libxml2.so.2 => /home/93u/spack/opt/spack/linux-centos7-skylake_avx512/gcc-9.3.0/libxml2-2.9.10-2phhvqmlyyvtxf2h5txfh4b27wqzcybf/lib/libxml2.so.2 (0x00007f40e93b0000)
    libz.so.1 => /home/93u/spack/opt/spack/linux-centos7-skylake_avx512/gcc-9.3.0/zlib-1.2.11-tznwo4ldqzkfx5qg3tfjk6khnj2535fq/lib/libz.so.1 (0x00007f40e9199000)
    liblzma.so.5 => /home/93u/spack/opt/spack/linux-centos7-skylake_avx512/gcc-9.3.0/xz-5.2.5-szm3fliykndtql4j7t56bn4wyfu2l272/lib/liblzma.so.5 (0x00007f40e8f73000)
    libiconv.so.2 => /home/93u/spack/opt/spack/linux-centos7-skylake_avx512/gcc-9.3.0/libiconv-1.16-uxrlfez4c3llcfxq32dxpynrpgzjzqba/lib/libiconv.so.2 (0x00007f40e8c76000)

    Version information:
    ./build_omp_llvm/bin/miniqmc:
        libgcc_s.so.1 (GCC_3.0) => /home/93u/spack/var/spack/environments/miniqmc/.spack-env/._view/pfhb5mav3qarw35f2su54ms3cvu4m4sz/lib64/libgcc_s.so.1
        libomptarget.so (VERS1.0) => /home/93u/spack/var/spack/environments/miniqmc/.spack-env/view/lib/libomptarget.so
        libm.so.6 (GLIBC_2.2.5) => /lib64/libm.so.6
        libm.so.6 (GLIBC_2.15) => /lib64/libm.so.6
        libomp.so (VERSION) => /home/93u/spack/var/spack/environments/miniqmc/.spack-env/view/lib/libomp.so
        libstdc++.so.6 (GLIBCXX_3.4.20) => /home/93u/spack/var/spack/environments/miniqmc/.spack-env/._view/pfhb5mav3qarw35f2su54ms3cvu4m4sz/lib64/libstdc++.so.6
        libstdc++.so.6 (GLIBCXX_3.4.11) => /home/93u/spack/var/spack/environments/miniqmc/.spack-env/._view/pfhb5mav3qarw35f2su54ms3cvu4m4sz/lib64/libstdc++.so.6
        libstdc++.so.6 (GLIBCXX_3.4.26) => /home/93u/spack/var/spack/environments/miniqmc/.spack-env/._view/pfhb5mav3qarw35f2su54ms3cvu4m4sz/lib64/libstdc++.so.6
        libstdc++.so.6 (GLIBCXX_3.4.9) => /home/93u/spack/var/spack/environments/miniqmc/.spack-env/._view/pfhb5mav3qarw35f2su54ms3cvu4m4sz/lib64/libstdc++.so.6
        libstdc++.so.6 (CXXABI_1.3) => /home/93u/spack/var/spack/environments/miniqmc/.spack-env/._view/pfhb5mav3qarw35f2su54ms3cvu4m4sz/lib64/libstdc++.so.6
        libstdc++.so.6 (GLIBCXX_3.4.21) => /home/93u/spack/var/spack/environments/miniqmc/.spack-env/._view/pfhb5mav3qarw35f2su54ms3cvu4m4sz/lib64/libstdc++.so.6
        libstdc++.so.6 (GLIBCXX_3.4) => /home/93u/spack/var/spack/environments/miniqmc/.spack-env/._view/pfhb5mav3qarw35f2su54ms3cvu4m4sz/lib64/libstdc++.so.6
        libc.so.6 (GLIBC_2.14) => /lib64/libc.so.6
        libc.so.6 (GLIBC_2.16) => /lib64/libc.so.6
        libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
    /home/93u/spack/var/spack/environments/miniqmc/.spack-env/view/lib/libopenblas.so.0:
        libpthread.so.0 (GLIBC_2.2.5) => /lib64/libpthread.so.0
        ld-linux-x86-64.so.2 (GLIBC_2.3) => /lib64/ld-linux-x86-64.so.2
        libgfortran.so.5 (GFORTRAN_8) => /home/93u/spack/opt/spack/linux-centos7-skylake_avx512/gcc-8.1.0/gcc-9.3.0-rd6ojgt5sp2q5yofly44m2b3oftdt5cl/lib64/libgfortran.so.5
        libgomp.so.1 (GOMP_4.0) => /home/93u/spack/opt/spack/linux-centos7-skylake_avx512/gcc-8.1.0/gcc-9.3.0-rd6ojgt5sp2q5yofly44m2b3oftdt5cl/lib64/libgomp.so.1
        libgomp.so.1 (GOMP_2.0) => /home/93u/spack/opt/spack/linux-centos7-skylake_avx512/gcc-8.1.0/gcc-9.3.0-rd6ojgt5sp2q5yofly44m2b3oftdt5cl/lib64/libgomp.so.1
        libgomp.so.1 (OMP_1.0) => /home/93u/spack/opt/spack/linux-centos7-skylake_avx512/gcc-8.1.0/gcc-9.3.0-rd6ojgt5sp2q5yofly44m2b3oftdt5cl/lib64/libgomp.so.1
        libc.so.6 (GLIBC_2.7) => /lib64/libc.so.6
        libc.so.6 (GLIBC_2.6) => /lib64/libc.so.6
        libc.so.6 (GLIBC_2.3.4) => /lib64/libc.so.6
        libc.so.6 (GLIBC_2.3.2) => /lib64/libc.so.6
        libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
        libm.so.6 (GLIBC_2.2.5) => /lib64/libm.so.6
    /lib64/libm.so.6:
        ld-linux-x86-64.so.2 (GLIBC_PRIVATE) => /lib64/ld-linux-x86-64.so.2
        libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
        libc.so.6 (GLIBC_PRIVATE) => /lib64/libc.so.6
    /lib64/libdl.so.2:
        ld-linux-x86-64.so.2 (GLIBC_PRIVATE) => /lib64/ld-linux-x86-64.so.2
        libc.so.6 (GLIBC_PRIVATE) => /lib64/libc.so.6
        libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
    /home/93u/spack/var/spack/environments/miniqmc/.spack-env/._view/pfhb5mav3qarw35f2su54ms3cvu4m4sz/lib64/libstdc++.so.6:
        libm.so.6 (GLIBC_2.2.5) => /lib64/libm.so.6
        ld-linux-x86-64.so.2 (GLIBC_2.3) => /lib64/ld-linux-x86-64.so.2
        libgcc_s.so.1 (GCC_4.2.0) => /home/93u/spack/var/spack/environments/miniqmc/.spack-env/._view/pfhb5mav3qarw35f2su54ms3cvu4m4sz/lib64/libgcc_s.so.1
        libgcc_s.so.1 (GCC_3.4) => /home/93u/spack/var/spack/environments/miniqmc/.spack-env/._view/pfhb5mav3qarw35f2su54ms3cvu4m4sz/lib64/libgcc_s.so.1
        libgcc_s.so.1 (GCC_3.3) => /home/93u/spack/var/spack/environments/miniqmc/.spack-env/._view/pfhb5mav3qarw35f2su54ms3cvu4m4sz/lib64/libgcc_s.so.1
        libgcc_s.so.1 (GCC_3.0) => /home/93u/spack/var/spack/environments/miniqmc/.spack-env/._view/pfhb5mav3qarw35f2su54ms3cvu4m4sz/lib64/libgcc_s.so.1
        libc.so.6 (GLIBC_2.14) => /lib64/libc.so.6
        libc.so.6 (GLIBC_2.6) => /lib64/libc.so.6
        libc.so.6 (GLIBC_2.4) => /lib64/libc.so.6
        libc.so.6 (GLIBC_2.16) => /lib64/libc.so.6
        libc.so.6 (GLIBC_2.17) => /lib64/libc.so.6
        libc.so.6 (GLIBC_2.3) => /lib64/libc.so.6
        libc.so.6 (GLIBC_2.3.2) => /lib64/libc.so.6
        libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
    /home/93u/spack/var/spack/environments/miniqmc/.spack-env/view/lib/libomp.so:
        libdl.so.2 (GLIBC_2.2.5) => /lib64/libdl.so.2
        ld-linux-x86-64.so.2 (GLIBC_2.3) => /lib64/ld-linux-x86-64.so.2
        librt.so.1 (GLIBC_2.2.5) => /lib64/librt.so.1
        libpthread.so.0 (GLIBC_2.3.2) => /lib64/libpthread.so.0
        libpthread.so.0 (GLIBC_2.2.5) => /lib64/libpthread.so.0
        libc.so.6 (GLIBC_2.6) => /lib64/libc.so.6
        libc.so.6 (GLIBC_2.14) => /lib64/libc.so.6
        libc.so.6 (GLIBC_2.3.2) => /lib64/libc.so.6
        libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
    /home/93u/spack/var/spack/environments/miniqmc/.spack-env/view/lib/libomptarget.so:
        libgcc_s.so.1 (GCC_3.0) => /home/93u/spack/var/spack/environments/miniqmc/.spack-env/._view/pfhb5mav3qarw35f2su54ms3cvu4m4sz/lib64/libgcc_s.so.1
        ld-linux-x86-64.so.2 (GLIBC_2.3) => /lib64/ld-linux-x86-64.so.2
        libdl.so.2 (GLIBC_2.2.5) => /lib64/libdl.so.2
        libstdc++.so.6 (GLIBCXX_3.4.20) => /home/93u/spack/var/spack/environments/miniqmc/.spack-env/._view/pfhb5mav3qarw35f2su54ms3cvu4m4sz/lib64/libstdc++.so.6
        libstdc++.so.6 (GLIBCXX_3.4.21) => /home/93u/spack/var/spack/environments/miniqmc/.spack-env/._view/pfhb5mav3qarw35f2su54ms3cvu4m4sz/lib64/libstdc++.so.6
        libstdc++.so.6 (GLIBCXX_3.4.11) => /home/93u/spack/var/spack/environments/miniqmc/.spack-env/._view/pfhb5mav3qarw35f2su54ms3cvu4m4sz/lib64/libstdc++.so.6
        libstdc++.so.6 (CXXABI_1.3) => /home/93u/spack/var/spack/environments/miniqmc/.spack-env/._view/pfhb5mav3qarw35f2su54ms3cvu4m4sz/lib64/libstdc++.so.6
        libstdc++.so.6 (GLIBCXX_3.4) => /home/93u/spack/var/spack/environments/miniqmc/.spack-env/._view/pfhb5mav3qarw35f2su54ms3cvu4m4sz/lib64/libstdc++.so.6
        libstdc++.so.6 (GLIBCXX_3.4.15) => /home/93u/spack/var/spack/environments/miniqmc/.spack-env/._view/pfhb5mav3qarw35f2su54ms3cvu4m4sz/lib64/libstdc++.so.6
        libc.so.6 (GLIBC_2.14) => /lib64/libc.so.6
        libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
    /home/93u/spack/var/spack/environments/miniqmc/.spack-env/._view/pfhb5mav3qarw35f2su54ms3cvu4m4sz/lib64/libgcc_s.so.1:
        libc.so.6 (GLIBC_2.14) => /lib64/libc.so.6
        libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
    /lib64/libpthread.so.0:
        ld-linux-x86-64.so.2 (GLIBC_2.2.5) => /lib64/ld-linux-x86-64.so.2
        ld-linux-x86-64.so.2 (GLIBC_2.3) => /lib64/ld-linux-x86-64.so.2
        ld-linux-x86-64.so.2 (GLIBC_PRIVATE) => /lib64/ld-linux-x86-64.so.2
        libc.so.6 (GLIBC_2.14) => /lib64/libc.so.6
        libc.so.6 (GLIBC_2.3.2) => /lib64/libc.so.6
        libc.so.6 (GLIBC_PRIVATE) => /lib64/libc.so.6
        libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
    /lib64/libc.so.6:
        ld-linux-x86-64.so.2 (GLIBC_2.3) => /lib64/ld-linux-x86-64.so.2
        ld-linux-x86-64.so.2 (GLIBC_PRIVATE) => /lib64/ld-linux-x86-64.so.2
    /home/93u/spack/opt/spack/linux-centos7-skylake_avx512/gcc-8.1.0/gcc-9.3.0-rd6ojgt5sp2q5yofly44m2b3oftdt5cl/lib64/libgfortran.so.5:
        ld-linux-x86-64.so.2 (GLIBC_2.3) => /lib64/ld-linux-x86-64.so.2
        libm.so.6 (GLIBC_2.2.5) => /lib64/libm.so.6
        libgcc_s.so.1 (GCC_4.2.0) => /home/93u/spack/var/spack/environments/miniqmc/.spack-env/._view/pfhb5mav3qarw35f2su54ms3cvu4m4sz/lib64/libgcc_s.so.1
        libgcc_s.so.1 (GCC_3.0) => /home/93u/spack/var/spack/environments/miniqmc/.spack-env/._view/pfhb5mav3qarw35f2su54ms3cvu4m4sz/lib64/libgcc_s.so.1
        libgcc_s.so.1 (GCC_3.3) => /home/93u/spack/var/spack/environments/miniqmc/.spack-env/._view/pfhb5mav3qarw35f2su54ms3cvu4m4sz/lib64/libgcc_s.so.1
        libgcc_s.so.1 (GCC_4.3.0) => /home/93u/spack/var/spack/environments/miniqmc/.spack-env/._view/pfhb5mav3qarw35f2su54ms3cvu4m4sz/lib64/libgcc_s.so.1
        libquadmath.so.0 (QUADMATH_1.0) => /home/93u/spack/opt/spack/linux-centos7-skylake_avx512/gcc-8.1.0/gcc-9.3.0-rd6ojgt5sp2q5yofly44m2b3oftdt5cl/lib64/libquadmath.so.0
        libc.so.6 (GLIBC_2.15) => /lib64/libc.so.6
        libc.so.6 (GLIBC_2.6) => /lib64/libc.so.6
        libc.so.6 (GLIBC_2.14) => /lib64/libc.so.6
        libc.so.6 (GLIBC_2.7) => /lib64/libc.so.6
        libc.so.6 (GLIBC_2.3.2) => /lib64/libc.so.6
        libc.so.6 (GLIBC_2.17) => /lib64/libc.so.6
        libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
        libc.so.6 (GLIBC_2.3) => /lib64/libc.so.6
    /home/93u/spack/opt/spack/linux-centos7-skylake_avx512/gcc-8.1.0/gcc-9.3.0-rd6ojgt5sp2q5yofly44m2b3oftdt5cl/lib64/libgomp.so.1:
        libdl.so.2 (GLIBC_2.2.5) => /lib64/libdl.so.2
        libpthread.so.0 (GLIBC_2.3.4) => /lib64/libpthread.so.0
        libpthread.so.0 (GLIBC_2.2.5) => /lib64/libpthread.so.0
        libc.so.6 (GLIBC_2.3) => /lib64/libc.so.6
        libc.so.6 (GLIBC_2.16) => /lib64/libc.so.6
        libc.so.6 (GLIBC_2.14) => /lib64/libc.so.6
        libc.so.6 (GLIBC_2.6) => /lib64/libc.so.6
        libc.so.6 (GLIBC_2.17) => /lib64/libc.so.6
        libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
    /home/93u/spack/opt/spack/linux-centos7-skylake_avx512/gcc-9.3.0/hwloc-2.4.1-3zhlqz36x2k6l27bjxm4vphe7p3kctk7/lib/libhwloc.so.15:
        libm.so.6 (GLIBC_2.2.5) => /lib64/libm.so.6
        libxml2.so.2 (LIBXML2_2.6.0) => /home/93u/spack/opt/spack/linux-centos7-skylake_avx512/gcc-9.3.0/libxml2-2.9.10-2phhvqmlyyvtxf2h5txfh4b27wqzcybf/lib/libxml2.so.2
        libxml2.so.2 (LIBXML2_2.4.30) => /home/93u/spack/opt/spack/linux-centos7-skylake_avx512/gcc-9.3.0/libxml2-2.9.10-2phhvqmlyyvtxf2h5txfh4b27wqzcybf/lib/libxml2.so.2
        libc.so.6 (GLIBC_2.14) => /lib64/libc.so.6
        libc.so.6 (GLIBC_2.7) => /lib64/libc.so.6
        libc.so.6 (GLIBC_2.3.4) => /lib64/libc.so.6
        libc.so.6 (GLIBC_2.6) => /lib64/libc.so.6
        libc.so.6 (GLIBC_2.4) => /lib64/libc.so.6
        libc.so.6 (GLIBC_2.3) => /lib64/libc.so.6
        libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
    /lib64/librt.so.1:
        libpthread.so.0 (GLIBC_2.3.2) => /lib64/libpthread.so.0
        libpthread.so.0 (GLIBC_PRIVATE) => /lib64/libpthread.so.0
        libpthread.so.0 (GLIBC_2.2.5) => /lib64/libpthread.so.0
        libc.so.6 (GLIBC_2.14) => /lib64/libc.so.6
        libc.so.6 (GLIBC_2.3.2) => /lib64/libc.so.6
        libc.so.6 (GLIBC_PRIVATE) => /lib64/libc.so.6
        libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
    /home/93u/spack/opt/spack/linux-centos7-skylake_avx512/gcc-8.1.0/gcc-9.3.0-rd6ojgt5sp2q5yofly44m2b3oftdt5cl/lib64/libquadmath.so.0:
        libc.so.6 (GLIBC_2.3) => /lib64/libc.so.6
        libc.so.6 (GLIBC_2.14) => /lib64/libc.so.6
        libc.so.6 (GLIBC_2.10) => /lib64/libc.so.6
        libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
        libm.so.6 (GLIBC_2.2.5) => /lib64/libm.so.6
    /home/93u/spack/opt/spack/linux-centos7-skylake_avx512/gcc-9.3.0/libpciaccess-0.16-jvl2ry7g6ubjh3umqcji34i5cpdzppn4/lib/libpciaccess.so.0:
        libc.so.6 (GLIBC_2.3) => /lib64/libc.so.6
        libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
    /home/93u/spack/opt/spack/linux-centos7-skylake_avx512/gcc-9.3.0/libxml2-2.9.10-2phhvqmlyyvtxf2h5txfh4b27wqzcybf/lib/libxml2.so.2:
        libdl.so.2 (GLIBC_2.2.5) => /lib64/libdl.so.2
        libz.so.1 (ZLIB_1.2.2.3) => /home/93u/spack/opt/spack/linux-centos7-skylake_avx512/gcc-9.3.0/zlib-1.2.11-tznwo4ldqzkfx5qg3tfjk6khnj2535fq/lib/libz.so.1
        libz.so.1 (ZLIB_1.2.3.3) => /home/93u/spack/opt/spack/linux-centos7-skylake_avx512/gcc-9.3.0/zlib-1.2.11-tznwo4ldqzkfx5qg3tfjk6khnj2535fq/lib/libz.so.1
        libm.so.6 (GLIBC_2.2.5) => /lib64/libm.so.6
        liblzma.so.5 (XZ_5.0) => /home/93u/spack/opt/spack/linux-centos7-skylake_avx512/gcc-9.3.0/xz-5.2.5-szm3fliykndtql4j7t56bn4wyfu2l272/lib/liblzma.so.5
        libc.so.6 (GLIBC_2.7) => /lib64/libc.so.6
        libc.so.6 (GLIBC_2.14) => /lib64/libc.so.6
        libc.so.6 (GLIBC_2.3.2) => /lib64/libc.so.6
        libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
        libc.so.6 (GLIBC_2.3) => /lib64/libc.so.6
    /home/93u/spack/opt/spack/linux-centos7-skylake_avx512/gcc-9.3.0/zlib-1.2.11-tznwo4ldqzkfx5qg3tfjk6khnj2535fq/lib/libz.so.1:
        libc.so.6 (GLIBC_2.14) => /lib64/libc.so.6
        libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
    /home/93u/spack/opt/spack/linux-centos7-skylake_avx512/gcc-9.3.0/xz-5.2.5-szm3fliykndtql4j7t56bn4wyfu2l272/lib/liblzma.so.5:
        libpthread.so.0 (GLIBC_2.3.3) => /lib64/libpthread.so.0
        libpthread.so.0 (GLIBC_2.3.2) => /lib64/libpthread.so.0
        libpthread.so.0 (GLIBC_2.2.5) => /lib64/libpthread.so.0
        libc.so.6 (GLIBC_2.3.4) => /lib64/libc.so.6
        libc.so.6 (GLIBC_2.14) => /lib64/libc.so.6
        libc.so.6 (GLIBC_2.6) => /lib64/libc.so.6
        libc.so.6 (GLIBC_2.17) => /lib64/libc.so.6
        libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
    /home/93u/spack/opt/spack/linux-centos7-skylake_avx512/gcc-9.3.0/libiconv-1.16-uxrlfez4c3llcfxq32dxpynrpgzjzqba/lib/libiconv.so.2:
        libc.so.6 (GLIBC_2.14) => /lib64/libc.so.6
        libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6

A potential problem might be something like libgomp.so.1 pointing to my GCC9 version instead of the GCC11 version, so I'm going to figure out the elegant way to do that in spack. That's sort of my thinking now, but I will report back if I make any additional progress.

ye-luo commented 3 years ago

@cabreraam your cmake recipe when using llvm is incorrect, read my wikipage.

cabreraam commented 3 years ago

@ye-luo I just added the flag, -D USE_OBJECT_TARGET=ON .. to my CMake recipe. I still got those same vectorization warnings from LLVM, BUT the project built and I was able to observe something happening!

[miniqmc] 93u@pcie:~/Research/miniqmc (OMP_offload)$ OMP_NUM_THREADS=2 nsys nvprof ./build_omp_llvm/bin/miniqmc
WARNING: miniqmc and any of its children processes will be profiled.

Collecting data...
miniqmc git branch: OMP_offload
miniqmc git commit: 8c6cc1981d475e6e567585dc10ce8899797112ca

Number of orbitals/splines = 192
Tile size = 192
Number of tiles = 1
Number of electrons = 384
Rmax = 1.7
AcceptanceRatio = 0.5
Iterations = 5
OpenMP threads = 2
Number of walkers per rank = 2

SPO coefficients size = 98304000 bytes (93.75 MB)
delayed update rank = 32
Using SoA distance table, Jastrow + einspline, 
and determinant update.
================================== 
Stack timer profile in seconds
Timer                             Inclusive_time  Exclusive_time  Calls       Time_per_call
Setup                                0.2405     0.2405              1       0.240482807
Total                                0.9404     0.0001              1       0.940395832
  Diffusion                          0.2596     0.0016              5       0.051926661
    Accept move                      0.0004     0.0004            932       0.000000476
    Complete Updates                 0.0007     0.0000              5       0.000136232
      Determinant::update            0.0007     0.0007             10       0.000067925
    Current Gradient                 0.0046     0.0005           1920       0.000002375
      Determinant::ratio             0.0037     0.0037           1920       0.000001926
      OneBodyJastrow                 0.0002     0.0002           1920       0.000000095
      TwoBodyJastrow                 0.0002     0.0002           1920       0.000000087
    Kinetic Energy                   0.0010     0.0010              5       0.000203419
      OneBodyJastrow                 0.0000     0.0000              5       0.000002146
      TwoBodyJastrow                 0.0000     0.0000              5       0.000001669
    Make move                        0.0195     0.0195           1920       0.000010174
    New Gradient                     0.2027     0.0007           1920       0.000105568
      Determinant::ratio             0.0004     0.0004           1920       0.000000226
      Determinant::spovgl            0.1944     0.0026           1920       0.000101268
        Single-Particle Orbitals     0.1918     0.1918           1920       0.000099918
      OneBodyJastrow                 0.0008     0.0008           1920       0.000000414
      TwoBodyJastrow                 0.0063     0.0063           1920       0.000003271
    Set active                       0.0197     0.0197           1920       0.000010276
    Update                           0.0093     0.0004            932       0.000010015
      Determinant::update            0.0048     0.0048            932       0.000005154
      OneBodyJastrow                 0.0001     0.0001            932       0.000000104
      TwoBodyJastrow                 0.0041     0.0041            932       0.000004349
  Initialization                     0.0486     0.0061              1       0.048598051
    Determinant::inverse             0.0027     0.0027              2       0.001345396
    Determinant::spovgl              0.0389     0.0008              2       0.019435525
      Single-Particle Orbitals       0.0380     0.0380            384       0.000099064
    OneBodyJastrow                   0.0001     0.0001              1       0.000097036
    TwoBodyJastrow                   0.0009     0.0009              1       0.000853062
  Pseudopotential                    0.6320     0.0013              5       0.126406860
    Make move                        0.0713     0.0713           7968       0.000008946
    Value                            0.5595     0.0026           7968       0.000070217
      Determinant::ratio             0.0007     0.0007           7968       0.000000090
      Determinant::spoval            0.5402     0.0028           7968       0.000067797
        Single-Particle Orbitals     0.5374     0.5374           7968       0.000067450
      OneBodyJastrow                 0.0020     0.0020           7968       0.000000253
      TwoBodyJastrow                 0.0139     0.0139           7968       0.000001750

========== Throughput ============ 

Total throughput ( N_walkers * N_elec^3 / Total time ) = 1.20424e+08
Diffusion throughput ( N_walkers * N_elec^3 / Diffusion time ) = 4.36178e+08
Pseudopotential throughput ( N_walkers * N_elec^2 / Pseudopotential time ) = 466608

Processing events...
Saving temporary "/tmp/nsys-report-959e-fb49-920e-f998.qdstrm" file to disk...
Creating final output files...

Processing [==============================================================100%]
Saved report file to "/tmp/nsys-report-959e-fb49-920e-f998.qdrep"
Exporting 97711 events: [=================================================100%]

Exported successfully to
/tmp/nsys-report-959e-fb49-920e-f998.sqlite

Generating CUDA API Statistics...
CUDA API Statistics (nanoseconds)

Time(%)      Total Time       Calls         Average         Minimum         Maximum  Name                                                                            
-------  --------------  ----------  --------------  --------------  --------------  --------------------------------------------------------------------------------
   54.9       752954137       16140         46651.4           17806         1024520  cuMemcpyDtoHAsync_v2                                                            
   34.2       468552759       16143         29025.1            3616         1104918  cuLaunchKernel                                                                  
    6.4        88230624           1      88230624.0        88230624        88230624  cuDevicePrimaryCtxRelease_v2                                                    
    1.9        25986898       16145          1609.6            1321          353888  cuStreamSynchronize                                                             
    1.5        20427582           5       4085516.4            5597        20365701  cuMemcpyHtoDAsync_v2                                                            
    0.9        12969278           1      12969278.0        12969278        12969278  cuModuleLoadDataEx                                                              
    0.1         1558823           1       1558823.0         1558823         1558823  cuModuleUnload                                                                  
    0.0          625521          10         62552.1            2801          276792  cuMemAlloc_v2                                                                   
    0.0          234806          32          7337.7            1085           93189  cuStreamCreate                                                                  
    0.0          122888          32          3840.3            1491           24669  cuStreamDestroy_v2                                                              
    0.0           93962           6         15660.3           12629           21897  cuMemcpyDtoH_v2                                                                 
    0.0           49634           1         49634.0           49634           49634  cuMemFree_v2                                                                    
    0.0            7542           1          7542.0            7542            7542  cuDevicePrimaryCtxSetFlags_v2                                                   

Generating CUDA Kernel Statistics...
CUDA Kernel Statistics (nanoseconds)

Time(%)      Total Time   Instances         Average         Minimum         Maximum  Name                                                                                                                                                                                                                                                                                                                                         
-------  --------------  ----------  --------------  --------------  --------------  --------------------------------------------------------------------------------------------------------------------                                                                                                                                                                                                                         
   64.2       283716558       11532         24602.5            6752           43999  __omp_offloading_2b_242462b__Z_4...                                                                                                                                                                                                                                                                                                          
   35.8       158234802        4608         34339.1            4831           56127  __omp_offloading_2b_242462b__Z_3...                                                                                                                                                                                                                                                                                                          
    0.0           18048           2          9024.0            7552           10496  __omp_offloading_2b_242462b__Z_2...                                                                                                                                                                                                                                                                                                          
    0.0           12096           1         12096.0           12096           12096  __omp_offloading_2b_242462b__Z_1...                                                                                                                                                                                                                                                                                                          

Generating CUDA Memory Operation Statistics...
CUDA Memory Operation Statistics (nanoseconds)

Time(%)      Total Time  Operations         Average         Minimum         Maximum  Name                                                                            
-------  --------------  ----------  --------------  --------------  --------------  --------------------------------------------------------------------------------
   59.7        30001941       16146          1858.2            1567            5344  [CUDA memcpy DtoH]                                                              
   40.3        20238658           5       4047731.6            1664        20231106  [CUDA memcpy HtoD]                                                              

CUDA Memory Operation Statistics (KiB)

              Total      Operations              Average            Minimum              Maximum  Name                                                                            
-------------------  --------------  -------------------  -----------------  -------------------  --------------------------------------------------------------------------------
          96000.254               5            19200.051              0.004            96000.000  [CUDA memcpy HtoD]                                                              
          86418.006           16146                5.352              0.001               15.000  [CUDA memcpy DtoH]                                                              

Generating NVTX Push-Pop Range Statistics...
NVTX Push-Pop Range Statistics (nanoseconds)

Report file moved to "/home/93u/Research/miniqmc/report2.qdrep"
Report file moved to "/home/93u/Research/miniqmc/report2.sqlite"

I think this means things are working?

ye-luo commented 3 years ago

Correct. It looks good now.

cabreraam commented 3 years ago

@ye-luo Excellent! I will go ahead and close this issue then. Thanks for your help! You can at least claim that LLVM 12 has been verified by one more person now.