Closed linehill closed 2 months ago
OpenCL backend does not need this as the driver releases the objects when they are not needed by the application or by the driver for internal in-progress tasks.
but the chipStar runtime might need these events to stay alive so that we can use them for syncing queues.
but the chipStar runtime might need these events to stay alive so that we can use them for syncing queues.
Aren’t the event referenced via shared_ptr
s which keeps them alive as long as needed? Or do you mean that some objects in the chipStar runtime might reference the events via raw pointers?
OpenCL backend called ´chipstar::Event::addDependency()´ without ever calling ´chipstar::Event::releaseDependencies()` during HIP application's lifetime and this had two possible outcomes:
1) Crash due to out of memory error because the unreleased event objects. This occured after a HIP program had streamed enough commands - like >30000 kernels.
2) Crash due to stack overflow at program exit / chipStar uninitialization. Because the event dependencies were not released, this led to a build up of very long event dependency chain. At uninitialization, the destruction of a Queue's last event led to destruction of its dependent events which led to destruction their dependend events and this possibly kept going until the crash which was caused by stack overflow from numerous call frames.
Both cases are fixed by removing the
addDependency()
call. AFAIK, the event dependency system is meant for timing safe release of the backend driver objects (cl_events in this case). OpenCL backend does not need this as the driver releases the objects when they are not needed by the application or by the driver for internal in-progress tasks.