Closed pvelesko closed 11 months ago
@linehill can you test on your end to see if you can reproduce OpenCL intel cpu failure on CI?
@linehill can you test on your end to see if you can reproduce OpenCL intel cpu failure on CI?
Do you mean the Unit_hipMultiThreadStreams1_AsyncSame
case? It passes on Intel iGPU and Intel CPU through OpenCL BE.
While working on the event, command list recycling I observed some instabilities and zombie kernel processes showing up again. Decided to take a pause on those other PRs to fully investigate this issue.
Invert record operation Since event recording enqueues things onto a queue, invert the operation: event->record(queue) to queue->record(event). Now all enqueue operations are part of the queue class.
Optimize event recording in Level Zero backend Previous implementation has two global barriers which is already inefficient. This was addressed by using a dependency chain instead. Furthermore, there are a couple of tests that do a lot of event recording and when the number of parallel unit tests would get pushed up past ~8, the total runtime of these tests can sometimes explode causing event timeouts. It appears to me that sometimes these timeouts would result in the zombie processes we've observed which would then subsequently cause other tests to timeout (creating more of such processes).
Max out parallel unit tests Tests now appear to run very consistently and I haven't observed any more hung kernel processes.
hipEventRecord test was crippled ( a long time ago) to exclude the kernel launch since it was preventing the test from compiling.. Quick fix for that (and a fair amount of wasted debugging...)
Note: In this PR I am using regular command lists for event recording. This part can be refactored later to only use ICL but I would like to merge this right away and get other outstanding PRs in.
[x] Merge #675
[x] Investigate OpenCL negative time