eth-cscs / DLA-Future

DLA-Future
https://eth-cscs.github.io/DLA-Future/master/
BSD 3-Clause "New" or "Revised" License
64 stars 14 forks source link

Add option to force use of contiguous buffers when using `DLAF_WITH_MPI_GPU_SUPPORT` #1096

Closed msimberg closed 8 months ago

msimberg commented 8 months ago

Based on #1088.

Introduces another dependent CMake option DLAF_WITH_MPI_GPU_SUPPORT_FORCE_CONTIGUOUS which I propose to default to ON when DLAF_WITH_MPI_GPU_SUPPORT is enabled, given that it seems to always perform better than not forcing contiguous buffers (small improvement with HIP, huge improvement with CUDA). I'd say long-term we can consider if it should just always be used without a CMake option, but I think at this early stage when we're still figuring out what works where I think it's nice to have an option for it.

Independent of the option we could also consider making both DLAF_WITH_MPI_GPU_SUPPORT and DLAF_MPI_GPU_SUPPORT_FORCE_CONTIGUOUS runtime options, but that would require larger changes and is a separate discussion.

In this PR I'm only forcing GPU buffers to be contiguous if the option is enabled. CPU buffers are left as before.

I've currently forced the use of contiguous buffers at the call-site of withTemporaryTile. This could also be done inside withTemporaryTile but at least in theory withTemporaryTile is not meant only for communication so I don't know if it makes sense. Communication is, however, the only use case for it right now. I'm open to thoughts on what you think is cleaner.

msimberg commented 8 months ago

cscs-ci run

msimberg commented 8 months ago

cscs-ci run

rasolca commented 8 months ago

cscs-ci run