ROCm / ROCgdb

This is ROCgdb, the ROCm source-level debugger for Linux, based on GDB, the GNU source-level debugger.
https://rocm.docs.amd.com/projects/ROCgdb/en/latest/
GNU General Public License v2.0
50 stars 9 forks source link

Is there a limit to the number of wavefronts that ROCgdb can see via the "info threads" command? #18

Closed kumazzu closed 1 year ago

kumazzu commented 1 year ago

I used ROCM to compile the HIP program. When a simple, but thread-intensive HIP program was run, I used ROCgdb for debugging and found that using the "info threads" command did not see the expected number of wavefronts. And I have compiled the rocm toolchain with Debug type, which can compile HIP programs correctly. So, can someone tell me how to see a previously unseen wavefront activated in ROCgdb debugging?

kumazzu commented 1 year ago

The threads of this program can be organized into about 2500 wavefronts, but only 2048 wavefronts can be seen with "info threads", so I guess there is a limit to how many wavefronts ROCgdb can display.

t-tye commented 1 year ago

There is no limit to the number of wavefront threads that ROCgdb will display. However, the GPU hardware has a limit on the number of wavefronts that can be executing on it at any one time. As wavefronts complete, new wavefronts will get created. Could this be what you are observing?

kumazzu commented 1 year ago

There is no limit to the number of wavefront threads that ROCgdb will display. However, the GPU hardware has a limit on the number of wavefronts that can be executing on it at any one time. As wavefronts complete, new wavefronts will get created. Could this be what you are observing?

Thank u for your answer. In fact, I observe that when some wavefronts of a program that uses a large number of wavefronts are completed, the newly created wavefronts could not be immediately observed in ROCgdb. And I would love to understand the process of a new wavefront's creation in ROCgdb. Could you give me some advice?

jpsamaroo commented 1 year ago

I'm not an expert on the hardware, and I'm mostly just guessing, so take this with a grain of salt: rocGDB, by its nature, shows you the state of the hardware during a specific point in time. At any point in time, there are only N wavefronts that the hardware can physically instantiate due to resource constraints. If a kernel needs more than N wavefronts, then the first N will be instantiated, and the rest will be instantiated at a later time, as resources become available. Thus, rocGDB can't actually see those future wavefronts until the hardware initializes them; if rocGDB was triggered slightly later, it could catch those wavefronts instead (but the previous ones might not be visible).

t-tye commented 1 year ago

@jpsamaroo that description captures the hardware pretty well.

ROCgdb essentially gets a snapshot of the waves currently existing each time it lists the threads.