Open ivandrodri opened 4 years ago
Hello @ivandrodri . This is a good question. I have personally never tried this, but here are my thoughts after looking at the code just now. I am sure @psychocoderHPC knows this, but he is currently on vacation.
I believe alpaka and so cupla have all memory buffers attached to a device already. For cupla, when you create a buffer with cuplaMalloc
, the buffer is attached to the device active at the moment (can be set via cuplaSetDevice
).
Then the copy functions like cuplaMemcpyAsync
just take two buffers, and internally check whether it's the same device or not and take care of it. For the CUDA backend, this function should just call either cudaMemcpyAsync
or cudaMemcpyPeerAsync
. I think you are right about automatic cudaDeviceEnablePeerAccess
, this should not be needed.
So in your case, I think just creating buffers while the right devices are active, and then using the universal copy functions should work.
Regarding the streams and asynchronous part (which is orthogonal to peer-to-peer), it mirrors CUDA: create streams via cuplaStreamCreate
, store them as type cuplaStream_t
, provide such a variable as last parameter to cuplaMemcpyAsync
.
Hello @sbastrakov, thanks a lot for the help! I'll try it and I let you know.
@ivandrodri Sry for the late response
Do you solve this issue already?
I never tried peer mem copies but alpaka should do the job transparently for you. A simple cuplaMemcpyAsync
should be enouph.
Hi, I would like to convert my code in CUDA to CUPLA and I have some issues with mutli-GPU/single-CPU P2P async. copies. This is the error:
error: argument of type "cuplaStream_t" is incompatible with parameter of type "cudaStream_t"
This is a typical P2P copy in my code:
So how do I write this in CUPLA?
In order to give access to P2P copy, this is what I am doing in CUDA:
It seems in CUPLA cudaDeviceEnablePeerAccess is done automatically and cudaDeviceCanAccessPeer disappears, so I think the function enableP2P is not necessary anymore, right?
Thanks for any help!
[edited by psychocoderHPC: fixed formation]