JuliaParallel / MPI.jl

MPI wrappers for Julia
https://juliaparallel.org/MPI.jl/
The Unlicense
376 stars 122 forks source link

Investigate need for `JULIA_CUDA_MEMORY_POOL=none` #843

Open vchuravy opened 3 months ago

vchuravy commented 3 months ago

One datapoint is that locally on OpenMPI 5, the test ran fine on one GPU.

There was a discussion elsewhere (maybe @maleadt remembers) if that flag is still needed or what MPI versions can now handle the new memory interface.

luraess commented 3 months ago

Tim Besard: Could somebody who understands CUDA + OpenMPI re-evaluate https://github.com/JuliaParallel/MPI.jl/pull/537? IIUC, the fact that UCX now supports the CUDA stream-ordered allocator (https://github.com/openucx/ucx/blob/04897a079ac88713842f7209c5e82430d095444e/NEWS#L63) means that this workaround shouldn't be suggested anymore.

The reason being that it is pretty costly, performance wise, and I see it set all the time in HPC user's environments (presumably provided by the system config)

One could (and should) test but isn't UCX one amongst other PML and thus there may be no guarantee that it will just work on clusters not relying on UCX but e.g. libfabric?