CHIP-SPV / chipStar

chipStar is a tool for compiling and running HIP/CUDA on SPIR-V via OpenCL or Level Zero APIs.
Other
157 stars 27 forks source link

OpenCL: Fix memory leak / OoM and stack overflow #837

Closed linehill closed 2 months ago

linehill commented 2 months ago

OpenCL backend called ´chipstar::Event::addDependency()´ without ever calling ´chipstar::Event::releaseDependencies()` during HIP application's lifetime and this had two possible outcomes:

1) Crash due to out of memory error because the unreleased event objects. This occured after a HIP program had streamed enough commands - like >30000 kernels.

2) Crash due to stack overflow at program exit / chipStar uninitialization. Because the event dependencies were not released, this led to a build up of very long event dependency chain. At uninitialization, the destruction of a Queue's last event led to destruction of its dependent events which led to destruction their dependend events and this possibly kept going until the crash which was caused by stack overflow from numerous call frames.

Both cases are fixed by removing the addDependency() call. AFAIK, the event dependency system is meant for timing safe release of the backend driver objects (cl_events in this case). OpenCL backend does not need this as the driver releases the objects when they are not needed by the application or by the driver for internal in-progress tasks.

pvelesko commented 2 months ago

OpenCL backend does not need this as the driver releases the objects when they are not needed by the application or by the driver for internal in-progress tasks.

but the chipStar runtime might need these events to stay alive so that we can use them for syncing queues.

linehill commented 2 months ago

but the chipStar runtime might need these events to stay alive so that we can use them for syncing queues.

Aren’t the event referenced via shared_ptrs which keeps them alive as long as needed? Or do you mean that some objects in the chipStar runtime might reference the events via raw pointers?