QMCPACK / qmcpack

Main repository for QMCPACK, an open-source production level many-body ab initio Quantum Monte Carlo code for computing the electronic structure of atoms, molecules, and solids with full performance portable GPU support
http://www.qmcpack.org
Other
292 stars 137 forks source link

(LLVM compiler bug) NV GPU Offload errors due to misaligned addresses #5138

Open prckent opened 3 weeks ago

prckent commented 3 weeks ago

Describe the bug

A whole variety of periodic Gaussian tests are failing with LLVM offload. The restart tests are also failing.

These are in the nightlies and offloading to V100.

See : https://cdash.qmcpack.org/viewTest.php?onlyfailed&buildid=7342

e.g. deterministic-diamondC_2x1x1_pp-vmcbatch_gaussian_sdj-1-1 https://cdash.qmcpack.org/tests/2182646 "PluginInterface" error: Failure to synchronize stream (nil): Error in cuStreamSynchronize: misaligned address omptarget error: Consult https://openmp.llvm.org/design/Runtimes.html for debugging options. SoaAtomicBasisSet.h:875:7: omptarget fatal error 1: failure of target construct while offloading is mandatory [sulfur:1226856] Process received signal

removed redundant ~~deterministic-restart-1-16 https://cdash.qmcpack.org/tests/2181794 Anonymous Buffer size per walker : 19280 Bytes. MEMORY increase 0 MB VMC::resetRun "PluginInterface" error: Faliure to copy data from device to host. Pointers: host = 0x00007f0df17bf3e4, device = 0x00007f0df20a9c00, size = 8: Error in cuMemcpyDtoHAsync: misaligned address omptarget error: Copying data from device failed. omptarget error: Call to targetDataEnd failed, abort target. omptarget error: Failed to process data after launching the kernel. omptarget error: Consult https://openmp.llvm.org/design/Runtimes.html for debugging options. "PluginInterface" error: ompBLAS.cpp:649:3: omptarget fatal error 1: failure of target construct while offloading is mandatory Failure to synchronize stream (nil): Error in cuStreamSynchronize: misaligned address~~

To Reproduce

Ask for latest software versions if not clear on cdash

Expected behavior Tests should pass

System: sulfur

prckent commented 3 weeks ago

Using LLVM 18.1.8

ye-luo commented 3 weeks ago

With clang, -DCMAKE_BUILD_TYPE=Debug doesn't add optimization flags -Ox namely using the default -O0. I can reproduce the issue and after adding -O3 using -DCMAKE_BUILD_TYPE=Debug -DCMAKE_CXX_FLAGS=-O3, the error disappears. So it is a compiler issue not QMCPACK source code issue.

prckent commented 3 weeks ago

Any chance for a small reproducer? Can you make an issue on the relevant repo and link it here?

ye-luo commented 3 weeks ago

Any chance for a small reproducer? Can you make an issue on the relevant repo and link it here?

Unfortunately, it will be very very low priority for me.

prckent commented 3 weeks ago

No worries.