boostorg / compute

A C++ GPU Computing Library for OpenCL
http://boostorg.github.io/compute/
Boost Software License 1.0
1.55k stars 333 forks source link

Added wait_list parameter to "copy" functions #797

Closed antonmyagkov closed 5 years ago

antonmyagkov commented 5 years ago

It's not possible to use current "copy" functions with CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE option in command queue, because wait_list parameter is missing.

keryell commented 5 years ago

Quite pervasive but it looks good,

jszuppe commented 5 years ago

Wouldn't it be better if it was done only for copy_async? Because, honestly, what's the help if you use copy which does not give you an event with out-of-order queue?

Ulfgard commented 5 years ago

It still helps. if you have an OoO-Queue, you might still want a blocking write operation ("copy back to device after you are done computing") but this has to wait until all relevant operations are done. This is especially interesting in a multi-thread context where each thread enqueues items independently into the same queue to ensure that the device always has enough to work on. In this case, doing explicit synchronization of the queue is harmfull to performance, synchronous copy is not.


From: Jakub Szuppe [notifications@github.com] Sent: Wednesday, October 24, 2018 6:51 AM To: boostorg/compute Cc: Subscribed Subject: Re: [boostorg/compute] Added wait_list parameter to "copy" functions (#797)

Wouldn't it be better if it was done only for copy_async? Because, honestly, what's the help if you use copy which does not give you an event with out-of-order queue?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/boostorg/compute/pull/797#issuecomment-432511972, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AOWTBq4o04NKCSKKSaar1Fl_PgwbJ0tqks5un_G4gaJpZM4Xje6z.

jszuppe commented 5 years ago

Right, I just don't want to run into a situation where we have multiple different ways async API is done in Boost.Compute. That's why I am just wondering if it's maybe better to have only *_async functions to work with OoO queues for now. However, I'm not saying 'no' to this change. @kylelutz What do you think?

Anyway, I would need to revive our CI and test it on a few platforms (I think full coverage is: Mac, NV, AMD, Intel, and POCL; IMHO can be limited to 1.2) to make sure it's all right. If anyone can help with that I'll be grateful.

Ulfgard commented 5 years ago

Without a wait_list, synchronous operations are impossible to use safely in an OoO context. In OpenCL, all operations except copies are asynchronous and synchronous copies have to wait until those operations are done. This little detail is abstracted away in an in-order queue since there is an implicit ordering of events(i.e. wait_list=all currently unfinished operations). In OoO we have to make this ordering explicit, otherwise the copy takes place immediately once the memory bus is free.

coveralls commented 5 years ago

Coverage Status

Coverage increased (+0.004%) to 84.024% when pulling 3de4bbaa0be5720d8383985434b9ecd2d0f65f6d on u-s:feature/copy-with-wait-list into 924ed68562b71a2e229a2aca0cb06f691294d203 on boostorg:develop.