eyalroz / cuda-api-wrappers

Thin, unified, C++-flavored wrappers for the CUDA APIs
BSD 3-Clause "New" or "Revised" License
790 stars 80 forks source link

Support scheduling actual batches of "batch memory operations" on a stream #454

Open eyalroz opened 1 year ago

eyalroz commented 1 year ago

We currently support individual "batch memory operations" (see also #452); but we don't support scheduling actual batches of them. Let's add that support. We'll need to provide some sort of a poor man's variant class for these operations, so that the user doesn't have to manipulate the raw union of batch ops.