Open hgtsoi opened 2 years ago
The feature doesn't appear to be present in ROCM 5.0.2 on the gfx90a architecture either. I've compiled my code using hipcc with as many debug compiler flags as I could find, like this:
-g -ggdb3 --offload-arch=gfx90a -Xarch_device -ggdb3 -Xarch_device -g -Xarch_device -O0 -Xclang -O0 -fstandalone-debug -gdwarf-5
Even with the above flags I still can't use rocgdb to print the values of private kernel variables from within a lane, it always returns \<optimized out>. Is printing variables from within a kernel actually supported at the moment?
The current ROCm 5.4 release does support function local variable printing if compiled with -O0 -g. Support for local/shared address space variables should be added in an upcoming release. You do need to have the focus on a specific AMD GPU thread and lane to see the values for that lane.
Hey folks,
Really looking forward to this feature becoming available! I just tried printing private kernel variables from within a lane, using rocgdb from ROCM 5.4.3 on the officially supported GFX906 architecture. It doesn't look like it's implemented yet, however if you use the CUDA backend with HIP you can print private kernel variables under cuda-gdb.
Printing variables allocated in private memory has been supported for some time now if compiled with -g -O0. If you have a case where it is not it would be good to see a reproduced we can investigate.
Hi t-tye,
Absolutely, attached is a self-contained matrix multiplication code that reproduces the problem. I have tried this workflow using ROCM 5.4.3 on GFX906 and ROCM 5.0.2 on GFX90a.
The goal is to print kernel variables i0 and i1 from within the kernel mat_mult, on line 168.
-ggdb
, but got the same outcomehipcc -g -O0 mat_mult_bugreport.cpp -o a.out
rocgdb ./a.out
b mat_mult
run
on GFX906 with ROCM 5.4.3 it skips over the breakpoint for an unknown reason and finishesrun
warning: Temporarily disabling breakpoints for unloaded shared library "/a.out#offset=12288&size=99280"
On GFX90a with ROCM 5.0.2 it hits the beakpoint and I can continue...
disable
info threads
thread 4
(this might be different for you)
lane 0
n
print i0
On GFX90a with ROCM 5.0.2 I get this output
$1 = <unavailable>
At present, with the versions of ROCM and architectures available to me there seems to be no straightforward way I can get kernel variables to print. I know we can inspect registers and have a look at the assembly (whose instructions don't appear to be documented publically at all) but this is too much of an ask for average researcher folk to delve into.
If this works at all I am curious to know which version of ROCM and architecture it does work on. I am giving a workshop in HIP soon with a focus on supercomputing. Having this feature available (or at least giving them hope as to what version they can expect it to work) would be so wonderful for the students/researchers.
@hgtsoi Do you still need assistance with this ticket? Thanks
Do we have symbol debuging for gpu kernels now? I am using ROCgdb shipped with rocm-4.5.2. When checking local variables or args in rocgdb, it always shows "Optimized". Wondering it was optimized out indeed or rocgdb does not support symbolic debugging for kernels in rocm-4.5.2?