Closed preda closed 8 months ago
Thanks for digging this up, I have an internal change that makes #signals not just for profiling but for everything to 64 per queue. That would fix your issue and also addresses the issue in ROCr, I would guess.
@saleelk thanks, this is the commit with the change BTW https://github.com/ROCm/clr/commit/c157bfb2022076959c521269b27f34996c1ee730
The change is two-fold:
add a flag to configure the pool size used when profiling, this allows to clearly configure the two values (profile vs. not-profile).
change the size of the profile pool down from 4096 which was too large: the kernel only provides 4094 events for DGPUs, and using two command-queues in OpenCL results in the bug described here: https://github.com/ROCm/ROCR-Runtime/issues/186
The new default of 1000 has this rationale: it allows up to 4 queues to fit within the 4094 events provided by the kernel (with a little margin).