Closed kumazzu closed 1 year ago
The threads of this program can be organized into about 2500 wavefronts, but only 2048 wavefronts can be seen with "info threads", so I guess there is a limit to how many wavefronts ROCgdb can display.
There is no limit to the number of wavefront threads that ROCgdb will display. However, the GPU hardware has a limit on the number of wavefronts that can be executing on it at any one time. As wavefronts complete, new wavefronts will get created. Could this be what you are observing?
There is no limit to the number of wavefront threads that ROCgdb will display. However, the GPU hardware has a limit on the number of wavefronts that can be executing on it at any one time. As wavefronts complete, new wavefronts will get created. Could this be what you are observing?
Thank u for your answer. In fact, I observe that when some wavefronts of a program that uses a large number of wavefronts are completed, the newly created wavefronts could not be immediately observed in ROCgdb. And I would love to understand the process of a new wavefront's creation in ROCgdb. Could you give me some advice?
I'm not an expert on the hardware, and I'm mostly just guessing, so take this with a grain of salt: rocGDB, by its nature, shows you the state of the hardware during a specific point in time. At any point in time, there are only N wavefronts that the hardware can physically instantiate due to resource constraints. If a kernel needs more than N wavefronts, then the first N will be instantiated, and the rest will be instantiated at a later time, as resources become available. Thus, rocGDB can't actually see those future wavefronts until the hardware initializes them; if rocGDB was triggered slightly later, it could catch those wavefronts instead (but the previous ones might not be visible).
@jpsamaroo that description captures the hardware pretty well.
ROCgdb essentially gets a snapshot of the waves currently existing each time it lists the threads.
I used ROCM to compile the HIP program. When a simple, but thread-intensive HIP program was run, I used ROCgdb for debugging and found that using the "info threads" command did not see the expected number of wavefronts. And I have compiled the rocm toolchain with Debug type, which can compile HIP programs correctly. So, can someone tell me how to see a previously unseen wavefront activated in ROCgdb debugging?