ROCm / ROCgdb

This is ROCgdb, the ROCm source-level debugger for Linux, based on GDB, the GNU source-level debugger.
https://rocm.docs.amd.com/projects/ROCgdb/en/latest/
GNU General Public License v2.0
50 stars 9 forks source link

fatal error: KFD_IOC_DBG_TRAP_GET_VERSION failed && How to set a breakpoint into the kernel function? #23

Open yuanyuanxia opened 1 year ago

yuanyuanxia commented 1 year ago

GPU is Vega 20. ROCm is 5.1.0. GNU gdb (rocm-rel-5.1-36) 11.2.

rocm-dbgapi was installed.

> apt search rocm-dbgapi
> Sorting... Done
> Full Text Search... Done
> rocm-dbgapi/Ubuntu,now 0.64.0.50100-36 amd64 [installed]
> Library to provide AMD GPU debugger API
> 
> rocm-dbgapi5.1.0/Ubuntu 0.64.0.50100-36 amd64
> Library to provide AMD GPU debugger API

Compile:

CXXFLAGS =-g -O0 -ggdb

Run:

rocgdb ./MatrixTranspose

I'm getting an error message during execution:

> (gdb) set debug amdgpu log-level verbose
> amd-dbgapi: amd_dbgapi_set_log_level (LOG_LEVEL_VERBOSE) {
> amd-dbgapi: } = void
> (gdb) run
> Starting program: /home/xyy/test_rocgdb/0_MatrixTranspose/MatrixTranspose
> amd-dbgapi: amd_dbgapi_process_attach (client_process_id=0x557aa4e8f5e0, process_id=0x557aa5109078) {
> amd-dbgapi:    callback: get_os_pid (pid=0x7fffbfc5aaec) {
> amd-dbgapi:    callback: } = STATUS_SUCCESS, *pid=1495950
> amd-dbgapi:    attaching process_1 to OS process 1495950
> amd-dbgapi:    detached process_1
> amd-dbgapi:    linux_driver_t statistics (pid 1495950): 0 reads (0), 0 writes (0)
> amd-dbgapi: fatal error: KFD_IOC_DBG_TRAP_GET_VERSION failed
> Backtrace:
>     ……
> amd-dbgapi: } = STATUS_FATAL
> Could not attach to process 1495950 (rc=-2)

If you ignore the above issues, you can still set breakpoints on host function except for kernel function. I found that breakpoints cannot be set inside the kernel function. Just like:

> (gdb) b main
> Breakpoint 1 at 0x216262: file MatrixTranspose.cpp, line 37.
> (gdb) b MatrixTranspose.cpp:12
> No compiled code for line 12 in file "MatrixTranspose.cpp".
> Make breakpoint pending on future shared library load? (y or [n]) y
> Breakpoint 2 (MatrixTranspose.cpp:12) pending.
> (gdb) i b
> Num     Type           Disp Enb Address            What
> 1       breakpoint     keep y   0x0000000000216262 in main() at MatrixTranspose.cpp:37
> 2       breakpoint     keep y   <PENDING>          MatrixTranspose.cpp:12

I try to use amdgpu-install to install both legacy and rocr opencl, but it doesn't work. And I get a new error message:

> WARNING: amdgpu dkms failed for running kernel

I want to set a breakpoint on the kernel function and print the information, how do I do it? Thanks~

ppanchad-amd commented 1 month ago

@yuanyuanxia Apologies for the lack of response. Do you still need assistance with this ticket? Thanks!