Open eyalroz opened 4 years ago
From the dupe issue:
The CUDA driver's 2D and 3D copying support "pitched" arrays, where the minor dimension has padding by a certain number of elements to potentially improve copying/reading/caching performance. We currently do not support accounting for this when copying to/from
cuda::arrays_t
's - and we should.
Now that we support CUDA arrays, and do some matter-of-fact dealing with pitched CUDA Runtime API calls, it's probably time we properly expanded that to pitched memory support.
Pitched memory is regular, "linear" memory, except that it is allocated with a "pitch", i.e. in the innermost dimension, there are gaps, making the stretches always start at some conveniently-aligned position. This means different allocation and different runtime API calls for copying - both if arrays are involved and if they aren't immediately.
This will also increase coverage of the CUDA runtime APIs.