Closed bentsherman closed 5 years ago
Added in commit 5288880262b22baf53cdd9cdd82185d2a26f71b6. You can do the honors of closing this if the added method is satisfactory to you.
Works great! It also cut the runtime of similarity opencl nearly in half for my small test. :O I wish I could profile the opencl code to see where the difference lies. Probably comes from the fact that the kernel launches and memory transfers can now be overlapped safely as in CUDA.
Based on a discussion I had with @4ctrl-alt-del, CUDA provides a function (
cudaStreamSynchronize()
) to wait on a stream, which is equivalent to waiting on all events emitted by the stream. Apparently OpenCL can do the same thing by emiting an event which waits for all events in the command queue. It would be useful to have this feature through something likeOpenCL::CommandQueue::wait()
. Refer toSimilarity::OpenCL::Worker
andSimilarity::CUDA::Worker
in KINC for an example of how the code is simplified by waiting on a stream instead of waiting on every event.