eyalroz / cuda-api-wrappers

Thin, unified, C++-flavored wrappers for the CUDA APIs
BSD 3-Clause "New" or "Revised" License
769 stars 79 forks source link

Add explicit support for pitched linear memory #138

Open eyalroz opened 4 years ago

eyalroz commented 4 years ago

Now that we support CUDA arrays, and do some matter-of-fact dealing with pitched CUDA Runtime API calls, it's probably time we properly expanded that to pitched memory support.

Pitched memory is regular, "linear" memory, except that it is allocated with a "pitch", i.e. in the innermost dimension, there are gaps, making the stretches always start at some conveniently-aligned position. This means different allocation and different runtime API calls for copying - both if arrays are involved and if they aren't immediately.

This will also increase coverage of the CUDA runtime APIs.

eyalroz commented 2 years ago

From the dupe issue:

The CUDA driver's 2D and 3D copying support "pitched" arrays, where the minor dimension has padding by a certain number of elements to potentially improve copying/reading/caching performance. We currently do not support accounting for this when copying to/from cuda::arrays_t's - and we should.