ROCm / rocprofiler

ROC profiler library. Profiling with perf-counters and derived metrics.
https://rocm.docs.amd.com/projects/rocprofiler/en/latest/
Other
115 stars 43 forks source link

Intercept mode deadlocks with multiple threads driving separate GPUs #113

Open gcongiu opened 1 year ago

gcongiu commented 1 year ago

Running intercept_multi_thread_monitoring test in papi (located in papi/src/components/rocm/tests/intercept_multi_thread_monitoring) deadlocks. The test runs a matrix-to-matrix multiplication kernel and uses multiple threads, each driving and monitoring its own dedicated GPU. The backtrace from rocgdb follows:

#0  0x00007ffff599e54d in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007ffff5999eb6 in _L_lock_941 () from /lib64/libpthread.so.0
#2  0x000000010440c741 in ?? ()
#3  0x00007ffeeb9999d1 in rocprofiler::util::HsaRsrcFactory::GetKernelNameRef(unsigned long) () from /opt/rocm-5.5.0/lib/librocprofiler64.so
#4  0x00007ffeeb982f33 in rocprofiler::InterceptQueue::QueryKernelName(unsigned long, amd_kernel_code_s const*) ()
   from /opt/rocm-5.5.0/lib/librocprofiler64.so
#5  0x00007ffeeb9898e5 in rocprofiler::InterceptQueue::OnSubmitCB(void const*, unsigned long, unsigned long, void*, void (*)(void const*, unsigned long)) () from /opt/rocm-5.5.0/lib/librocprofiler64.so
#6  0x00007fffece1aafa in rocr::core::InterceptQueue::StoreRelaxed(long) () from /opt/rocm-5.5.0/lib/libhsa-runtime64.so.1
#7  0x00007fffece0d9a8 in rocr::HSA::hsa_signal_store_screlease(hsa_signal_s, long) () from /opt/rocm-5.5.0/lib/libhsa-runtime64.so.1
#8  0x00007ffff66850c4 in bool roc::VirtualGPU::dispatchGenericAqlPacket<hsa_kernel_dispatch_packet_s>(hsa_kernel_dispatch_packet_s*, unsigned short, unsigned short, bool, unsigned long) () from /opt/rocm-5.5.0/lib/libamdhip64.so.5
#9  0x00007ffff66828d7 in roc::VirtualGPU::submitKernelInternal(amd::NDRangeContainer const&, amd::Kernel const&, unsigned char const*, void*, unsigned int, amd::NDRangeKernelCommand*, hsa_kernel_dispatch_packet_s*) () from /opt/rocm-5.5.0/lib/libamdhip64.so.5
#10 0x00007ffff6683638 in roc::VirtualGPU::submitKernel(amd::NDRangeKernelCommand&) () from /opt/rocm-5.5.0/lib/libamdhip64.so.5
#11 0x00007ffff6654d1a in amd::Command::enqueue() () from /opt/rocm-5.5.0/lib/libamdhip64.so.5
#12 0x00007ffff657b723 in ihipModuleLaunchKernel(ihipModuleSymbol_t*, unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, ihipStream_t*, void**, void**, ihipEvent_t*, ihipEvent_t*, unsigned int, unsigned int, unsigned int, unsigned int, unsigned long, unsigned long, unsigned int) () from /opt/rocm-5.5.0/lib/libamdhip64.so.5
#13 0x00007ffff65a27a7 in ihipLaunchKernel(void const*, dim3, dim3, void**, unsigned long, ihipStream_t*, ihipEvent_t*, ihipEvent_t*, int) ()
   from /opt/rocm-5.5.0/lib/libamdhip64.so.5
#14 0x00007ffff657b5a2 in hipLaunchKernel_common () from /opt/rocm-5.5.0/lib/libamdhip64.so.5
#15 0x00007ffff6589e12 in hipLaunchKernel () from /opt/rocm-5.5.0/lib/libamdhip64.so.5
#16 0x0000000000473597 in __device_stub__matmul(float*, float*, float*, int) ()
#17 0x0000000000473aa5 in hip_do_matmul_work (handle=0xa88530, stream=0xa1f610) at matmul.cpp:111
#18 0x00000000004730fc in .omp_outlined._debug__(int &, const char *(&)[4], int &, hipError_t &) const (.global_tid.=0x7fffffffc670,
    .bound_tid.=0x7fffffffc668, papi_errno=@0x7fffffffca6c: 0, events=..., pass_with_warning=@0x7fffffffca68: 0,
    hip_errno=@0x7fffffffca64: hipSuccess) at multi_thread_monitoring.cpp:103
#19 0x00000000004734cd in .omp_outlined.(void) const (.global_tid.=0x7fffffffc670, .bound_tid.=0x7fffffffc668, papi_errno=@0x7fffffffca6c: 0,
    events=..., pass_with_warning=@0x7fffffffca68: 0, hip_errno=@0x7fffffffca64: hipSuccess) at multi_thread_monitoring.cpp:63
#20 0x00007ffff7fcbf43 in __kmp_invoke_microtask () from /opt/rocm-5.5.0/llvm/bin/../lib/libomp.so
#21 0x00007ffff7f5177f in __kmp_invoke_task_func () from /opt/rocm-5.5.0/llvm/bin/../lib/libomp.so
#22 0x00007ffff7f4b85b in __kmp_fork_call () from /opt/rocm-5.5.0/llvm/bin/../lib/libomp.so
#23 0x00007ffff7f3cb05 in __kmpc_fork_call () from /opt/rocm-5.5.0/llvm/bin/../lib/libomp.so
#24 0x0000000000472da2 in multi_thread (argc=1, argv=0x7fffffffcb88) at multi_thread_monitoring.cpp:63
#25 0x00000000003eaf80 in main (argc=1, argv=0x7fffffffcb88) at intercept_multi_thread_monitoring.cpp:13

The version of ROCm used to reproduce this problem is 5.5.0 RC5. Test was ran on two MI210s.

gcongiu commented 1 year ago

I verified this with rocm-5.5.0 stable release and the problem is also present there.

gcongiu commented 1 year ago

@ammarwa any update on this?

gcongiu commented 1 year ago

Added reproducer issue-113.tar.gz