Open Chiil opened 2 years ago
In the discussion on Discourse somebody suggested to use export JULIA_CUDA_MEMORY_POOL=none
and this solves the problem. I do not know though whether this is a bug, because it would be great if the pool and the CUDA-aware MPI can be combined.
Yes, I tried that as well. Only the export JULIA_CUDA_MEMORY_POOL=none
solves my problems.
Ah ok, upstream issue is https://github.com/openucx/ucx/issues/7110
I am not sure whether this is a MPI.jl issue or something from our local supercomputer, but I have a failing
Alltoall
in my Julia code, whereas the identical code in C++ works, showing that the problem does not lie in our MPI or CUDA install. I do not really know how to proceed from here. I got excellent help in making sure that the libraries are set up correctly at https://discourse.julialang.org/t/cuda-aware-mpi-works-on-system-but-not-for-julia/75060, but the problem remains The error is:The code that triggers this error is:
The equivalent working C++ code is: