Closed fwyzard closed 2 years ago
Synchronise the queues and wait for the host-device copies to complete only when using the TBB async backend, without the caching allocator.
Recovers some of the performance when using the CUDA backend with a large number of threads:
Thanks to @waredjeb for pointing this out !
Synchronise the queues and wait for the host-device copies to complete only when using the TBB async backend, without the caching allocator.