eyalroz / cuda-api-wrappers

Thin, unified, C++-flavored wrappers for the CUDA APIs
BSD 3-Clause "New" or "Revised" License
799 stars 80 forks source link

Unify async and non-async operations using optional_ref<const stream_t> optional arguments #689

Open eyalroz opened 1 month ago

eyalroz commented 1 month ago

In the comments re issue #641 , I sketch out an approach to unifying the copy functions involving optional_ref<const stream_t> for the same wrapper API call usable both for async and non-async variants of CUDA's own APIs. This is also applicable to other operations in the cuda::memory::async namespace: set(), zero(), allocate(), free(). If we adopt that approach for copying, let's also have it for these operations.