GPUOpen-Tools / gpu_performance_api

GPU Performance API for AMD GPUs
MIT License
250 stars 46 forks source link

Error out of counter splitting if group max is zero #70

Closed jcortell68 closed 2 months ago

jcortell68 commented 1 year ago

A counter group having a counter max of zero is invalid and will ultimately result in a hang when we try to split counters into multiple passes. This is one of various scenarios that result in a hang during counter splitting; see

https://github.com/GPUOpen-Tools/gpu_performance_api/issues/69

This fixes only that specific scenario. We now check that the group max isn't zero, and if it is, we give up trying to split a public counter's HW counters into multiple passes. We log an error, too.

Again, this isn't a comprehensive fix for issue 69. There could be other cases of bad data that result in a hang. Issue 69 should be fixed with a pass cap limit to cover all cases. But this commit still adds value in that it flags the specific invalid GPU counter metadata in addition to avoiding the hang.

Change-Id: I56d7d2043ba92c1b6088f0fdd68f5ec844e7b823

jcortell68 commented 1 year ago

Sure thing. Done. I also realized I neglected to add a GPA_ENUM_STRING_VAL for the new error code