Illegal memory access with FODO example on GPU

n01r commented 2 years ago

Hi, I tried to run the FODO example without changes on the Perlmutter GPU partition and encountered the following error:

amrex::Abort::0::CUDA error 700 in file /global/homes/m/mgarten/src/impactx/build/_deps/fetchedamrex-src/Src/Base/AMReX_GpuDevice.cpp line 660: an illegal memory access was encountered !!!
SIGABRT
See Backtrace.0 file for details
MPICH ERROR [Rank 0] [job id 2519166.0] [Wed Jun 29 18:07:42 2022] [nid001512] - Abort(6) (rank 0 in comm 480): application called MPI_Abort(comm=0x84000001, 6) - process 0

srun: error: nid001512: task 0: Exited with exit code 6
srun: launch/slurm: _step_signal: Terminating StepId=2519166.

I first tried the example with the submit script that is provided by the docs (however, had to change the naming a bit since it's still copied from WarpX). This configuration used 4 nodes but I also tried a single node and then just a single GPU per node. All fail with the same error.

The backtrace reads the following:

Backtrace.0

``` === If no file names and line numbers are shown below, one can run addr2line -Cpfie my_exefile my_line_address to convert `my_line_address` (e.g., 0x4a6b) into file name and line number. Or one can use amrex/Tools/Backtrace/parse_bt.py. === Please note that the line number reported by addr2line may not be accurate. One can use readelf -wl my_exefile | grep my_line_address' to find out the offset for that line. 0: /pscratch/sd/m/mgarten/impactx/001_FODO_single-GPU/./impactx() [0x5d63b6] amrex::BLBackTrace::print_backtrace_info(_IO_FILE*) at ??:? 1: /pscratch/sd/m/mgarten/impactx/001_FODO_single-GPU/./impactx() [0x5d875c] amrex::BLBackTrace::handler(int) at ??:? 2: /pscratch/sd/m/mgarten/impactx/001_FODO_single-GPU/./impactx() [0x5c13e9] amrex::Gpu::Device::streamSynchronizeAll() at ??:? 3: /pscratch/sd/m/mgarten/impactx/001_FODO_single-GPU/./impactx() [0x5b6165] amrex::MFIter::~MFIter() at ??:? 4: /pscratch/sd/m/mgarten/impactx/001_FODO_single-GPU/./impactx() [0x473e79] impactx::Push(impactx::ImpactXParticleContainer&, std::__cxx11::list, std::allocator > > const&) at ??:? 5: /pscratch/sd/m/mgarten/impactx/001_FODO_single-GPU/./impactx() [0x432b06] impactx::ImpactX::evolve(int) at ??:? 6: /pscratch/sd/m/mgarten/impactx/001_FODO_single-GPU/./impactx() [0x41ecfe] main at ??:? 7: /lib64/libc.so.6(__libc_start_main+0xef) [0x7f2ab60502bd] 8: /pscratch/sd/m/mgarten/impactx/001_FODO_single-GPU/./impactx() [0x42da0a] _start ../sysdeps/x86_64/start.S:122 ===== TinyProfilers ====== main() ImpactX::evolve ImpactX::evolve::step "Backtrace.0" 43L, 1979C 1,66 Top ```

I am adding the machine/system label since an earlier Slack message from @ax3l said that it runs on Summit V100s without fail.

ax3l commented 2 years ago

Memo from our discussion:

Debug workflow: https://warpx.readthedocs.io/en/latest/usage/workflows/debugging.html
cuda-gdb with AMReX runtime options amrex.throw_exception = 1 amrex.signal_handling = 0

n01r commented 2 years ago

With cudatoolkit/11.5, running cuda-gdb gives an error

```Shell (impactx) mgarten@nid001512:/pscratch/sd/m/mgarten/impactx/001_FODO_single-GPU_DEBUG> cuda-gdb cuda-gdb: warning: PyMemoryView_FromObject: called while Python is not available! Python path configuration: PYTHONHOME = (not set) PYTHONPATH = '/opt/cray/pe/python/3.9.7.1' program name = 'python3' isolated = 0 environment = 1 user site = 1 import site = 1 sys._base_executable = '/global/homes/m/mgarten/sw/perlmutter/venvs/impactx/bin/python3' sys.base_prefix = '/opt/cray/pe/python/3.9.7.1' sys.base_exec_prefix = '/opt/cray/pe/python/3.9.7.1' sys.platlibdir = 'lib' sys.executable = '/global/homes/m/mgarten/sw/perlmutter/venvs/impactx/bin/python3' sys.prefix = '/opt/cray/pe/python/3.9.7.1' sys.exec_prefix = '/opt/cray/pe/python/3.9.7.1' sys.path = [ '/opt/cray/pe/python/3.9.7.1', '/opt/cray/pe/python/3.9.7.1/lib/python39.zip', '/opt/cray/pe/python/3.9.7.1/lib/python3.9', '/opt/cray/pe/python/3.9.7.1/lib/python3.9/lib-dynload', ] Fatal Python error: init_fs_encoding: failed to get the Python codec of the filesystem encoding Python runtime state: core initialized Traceback (most recent call last): File "", line 1007, in _find_and_load cuda-gdb: warning: PyMemoryView_FromObject: called while Python is not available! File "", line 986, in _find_and_load_unlocked cuda-gdb: warning: PyMemoryView_FromObject: called while Python is not available! File "", line 680, in _load_unlocked cuda-gdb: warning: PyMemoryView_FromObject: called while Python is not available! File "", line 846, in exec_module cuda-gdb: warning: PyMemoryView_FromObject: called while Python is not available! File "", line 951, in get_code cuda-gdb: warning: PyMemoryView_FromObject: called while Python is not available! SystemError: returned NULL without setting an error ```

But swapping it out for cudatoolkit/11.0 lets me run the debugger.

cuda-gdb run

```Shell (cuda-gdb) file impactx Reading symbols from impactx...done. (cuda-gdb) run input_fodo.in amrex.throw_exception=1 amrex.signal_handling=0 Starting program: /pscratch/sd/m/mgarten/impactx/001_FODO_single-GPU_DEBUG/impactx input_fodo.in amrex.throw_exception=1 amrex.signal_handling=0 [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". warning: File "/opt/cray/pe/gcc/11.2.0/snos/lib64/libstdc++.so.6.0.29-gdb.py" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load". To enable execution of this file add add-auto-load-safe-path /opt/cray/pe/gcc/11.2.0/snos/lib64/libstdc++.so.6.0.29-gdb.py line to your configuration file "/global/homes/m/mgarten/.cuda-gdbinit". To completely disable this security protection add set auto-load safe-path / line to your configuration file "/global/homes/m/mgarten/.cuda-gdbinit". For more information about this security protection see the "Auto-loading safe path" section in the GDB manual. E.g., run from the shell: info "(gdb)Auto-loading safe path" warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time [New Thread 0x7fffe5ed7000 (LWP 67128)] Initializing CUDA... [Detaching after fork from child process 67129] [New Thread 0x7fffdbbb0000 (LWP 67141)] [New Thread 0x7fffdb3af000 (LWP 67142)] warning: Cuda API error detected: cuPointerGetAttribute returned (0x1) warning: Cuda API error detected: cuPointerGetAttribute returned (0x1) CUDA initialized with 1 GPU per MPI rank; 1 GPU(s) used in total warning: Cuda API error detected: cuPointerGetAttribute returned (0x1) warning: Cuda API error detected: cuPointerGetAttribute returned (0x1) MPI initialized with 1 MPI processes MPI initialized with thread support level 0 AMReX (22.06-39-g2d931f63cb4d) initialized warning: Cuda API error detected: cuPointerGetAttribute returned (0x1) warning: Cuda API error detected: cuPointerGetAttribute returned (0x1) boxArray(0) (BoxArray maxbox(1) m_ref->m_hash_sig(0) ((0,0,0) (7,7,7) (0,0,0)) ) warning: Cuda API error detected: cuPointerGetAttribute returned (0x1) warning: Cuda API error detected: cuPointerGetAttribute returned (0x1) Beam kinetic energy (MeV): 2000 Bunch charge (C): 0 Particle type: electron Number of particles: 10000 Beam distribution type: waterbag Static units Initialized beam distribution parameters warning: Cuda API error detected: cuPointerGetAttribute returned (0x1) warning: Cuda API error detected: cuPointerGetAttribute returned (0x1) # of particles: 10000 Initialized element list ++++ Starting step=0 warning: Cuda API error detected: cuPointerGetAttribute returned (0x1) warning: Cuda API error detected: cuPointerGetAttribute returned (0x1) CUDA Exception: Warp Illegal Address The exception was triggered at PC 0x6f5c240 (Drift.H:69) Thread 1 "impactx" received signal CUDA_EXCEPTION_14, Warp Illegal Address. [Switching focus to CUDA kernel 0, grid 61, block (0,0,0), thread (128,0,0), device 0, sm 0, warp 4, lane 0] 0x0000000006f5c250 in impactx::Drift::operator() (this=0x131f9ad0, p=..., px=, py=, pt=, refpart=...) at /global/homes/m/mgarten/src/impactx/src/particles/elements/Drift.H:69 69 p.pos(0) = x + m_ds * px; ``` Backtrace ```Shell (cuda-gdb) backtrace #0 0x0000000006f5c250 in impactx::Drift::operator() (this=0x131f9ad0, p=..., px=, py=, pt=, refpart=...) at /global/homes/m/mgarten/src/impactx/src/particles/elements/Drift.H:69 #1 impactx::detail::PushSingleParticle::operator() (this=0x7fffddfffbf8, i=) at /global/homes/m/mgarten/src/impactx/src/particles/Push.cpp:81 #2 amrex::detail::call_f, int> (f=..., i=) at /global/u1/m/mgarten/src/impactx/build/_deps/fetchedamrex-src/Src/Base/AMReX_GpuLaunchFunctsG.H:752 #3 0x00000000070cd460 in _ZZN5amrex11ParallelForIiRKN7impactx6detail18PushSingleParticleIRKNS1_5DriftEEEvEENSt9enable_ifIXsr5amrex19MaybeDeviceRunnableIT0_vEE5valueEvE4typeERKNS_3Gpu10KernelInfoET_OSB_ENKUlvE_clEv (this=) at /global/u1/m/mgarten/src/impactx/build/_deps/fetchedamrex-src/Src/Base/AMReX_GpuLaunchFunctsG.H:802 Backtrace stopped: previous frame inner to this frame (corrupt stack?) ```

I could not see any values for variables because the compiler optimizes them out in Drift.H.

69              p.pos(0) = x + m_ds * px;
(cuda-gdb) print px
$1 = <optimized out>
(cuda-gdb) print x
$2 = <optimized out>
(cuda-gdb) print p
$3 = (@local _ZN7impactx5Drift5PTypeE & @local) <error reading variable>
(cuda-gdb) break Drift.H:67

So I built again with the option g -O0 and hopefully I will see more.

Edit: ... I actually tried to build it again without optimization but it still shows <optimized out>. Should I have deleted the build directory completely before?

cemitch99 commented 2 years ago

The object p is complicated struct, so I think the final line makes sense. I'm not sure if gdb will allow a print p.pos(0), etc.

ax3l commented 2 years ago

In the end, the current AMReX particle AoS object p is really just a

struct {
   amrex::ParticleReal r[n];
   int i[m];
};

You could check in cuda-gdb if the object p is valid memory (on the device) itself by printing its address and checking its range and then printing it's first member (which we interpret as position x).

... I actually tried to build it again without optimization but it still shows . Should I have deleted the build directory completely before?

yes, you need to redo the configure step with a fresh build dir. CXXFLAGS are only added at the first configure in a build directory (they change defaults for the configure step).

n01r commented 2 years ago

yes, you need to redo the configure step with a fresh build dir. CXXFLAGS are only added at the first configure in a build directory (they change defaults for the configure step).

But deleting build, running cmake -S . -B build and then doing ccmake build, editing stuff, hitting c to configure and g to generate should work, no?

ax3l commented 2 years ago

that should work in general... doing it with a single configure is the safest bet if you are unsure though. You can configure with -DCMAKE_VERBOSE_MAKEFILE=ON if you are unsure what's ending up on the compiler line and want to see.

ax3l commented 2 years ago

cc @WeiqunZhang @atmyers @kngott turns out this is in part a bug in AMReX init with GPU-aware MPI on Perlmutter.

If I set export MPICH_GPU_SUPPORT_ENABLED=0 the issue Cuda API error detected: cuPointerGetAttribute returned (0x1) vanishes. Backtrace:

The other issue above is an when we try to access fundamental types (not even pointers) of lattice elements on device, e.g., the amrex::ParticleReal m_ds member: CUDA Exception: Warp Illegal Address. The problem is so weird that I start to think it's a compiler bug... and it probably is: #174

ECP-WarpX / impactx

Illegal memory access with FODO example on GPU #154