Enabling queue profiling by default slow down kernel enqueue API calls according to vtune, at least, on Intel OpenCL targeting Intel ARC A750. Disabling the profiling improved some HeCBench cases on the device:
overlay-hip: ~1.80x speed up.
floydwarshall-hip: ~1.47x speed up.
tqs-hip: ~1.13x speed up.
This patch creates queues with and without profiling and the non-profiling one is used at start. The BE switches to use the profiling queue when needed. Note, there is only transition from non-profiling queue to profiling one but not back.
Also, add environment variable for forcing queue profiling to be disabled.
Enabling queue profiling by default slow down kernel enqueue API calls according to vtune, at least, on Intel OpenCL targeting Intel ARC A750. Disabling the profiling improved some HeCBench cases on the device:
overlay-hip: ~1.80x speed up.
floydwarshall-hip: ~1.47x speed up.
tqs-hip: ~1.13x speed up.
This patch creates queues with and without profiling and the non-profiling one is used at start. The BE switches to use the profiling queue when needed. Note, there is only transition from non-profiling queue to profiling one but not back.
Also, add environment variable for forcing queue profiling to be disabled.