Closed beckermr closed 1 year ago
Who from NESRC wrote this? lol It makes 0 sense... mpi4py is CUDA unaware. Only the underlying MPI library matters.
Do we think there is a way to enable this in the conda-forge build?
btw it's already done. If you create a fresh conda env
conda create -n my_env python mpi4py openmpi
and follow the on-screen instruction, CUDA awareness can be kicked off. As I said, it's done through the underlying MPI (Open MPI, in this case), not by mpi4py.
See the release notes here: https://github.com/mpi4py/mpi4py/releases/tag/3.1.0
@dalcinl is active in conda-forge and is an mpi4py maintainer.
I wrote the mpi4py support with @dalcinl, and I enabled the CUDA awareness support on conda-forge. I am not sure what you're trying to get at.
We note, in particular, that mpi4py is by itself CUDA- unaware.
https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9439927
Ah ok. I did not realize. Thank you for the help!
No problem @beckermr! btw I am reaching out to our NERSC support persons to get that doc fixed, but if you're already ahead of me, just let me know 🙂
I have not reached out to them. AFAIK then, there is no need to rebuild mpi4py with cuda support. We can link the conda-forge package directly to the NERSC mpi libraries.
Yes, they just need to use the "external" MPI packages. I know for certain MPICH would work on NERSC.
Right. BTW, we don't have mpich builds with cuda support in conda-forge, right?
Nope, unfortunately MPICH requires the CUDA support to be built at compile time, and last time I checked with the MPICH devs there's no launch-time/run-time protection if CUDA is absent (link). So, unless the core devs agree to special-case for MPICH (I am looking at you Matt 😉), I don't think it's appropriate to build MPICH for each different CUDA major.minor.
Got it. I don't think I was involved much (at all?) with the previous openmpi discussions, so I won't comment on the run-time support issue. :)
We might be able to improve on openmpi
's CUDA support. Please see issue ( https://github.com/conda-forge/openmpi-feedstock/issues/119 ) and linked PR for more context. It still needs a bit more work, but maybe with a few more people looking at it we can sort out the remaining issues 😉
@leofang Maybe some of the wording you used here is not appropriate? You said mpi4py is CUDA-unaware... Well, that's a bit confusing. Perhaps the best way to say it is mpi4py inherits the GPU-awareness of the MPI backend library + mpi4py fully supports for the DLPack and CAI protocols.
Well, the ship has sailed, and I can't believe I need to quote my (our) paper twice in a day 😂
We note, in particular, that mpi4py is by itself CUDA- unaware.
https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9439927
Well, that comment in the paper within the context of the surrounding text is very clear what you meant. But outside that context, saying CUDA-unaware can be easily misunderstood as mpi4py does not support CUDA-aware MPIs.
Right, if we are to have a pedantic discussion, let me wear my CUDA hat back 🙂 When we say a package/project/whatever is CUDA aware, one or multiple of the following conditions should hold true:
Clearly, none of these applies to mpi4py (despite we put significant efforts in supporting it!) I could have said "after a careful design and collaboration with the community, mpi4py is able to support Python GPU libraries without being aware of CUDA," but this is just awfully mouthy. From the packaging perspective, it's a lot easier if we just say mpi4py is CUDA unaware. Pedantic discussions should be left to a design issue or a paper IMHO.
Just wanted to reach out concerning CUDA on NERSC Perlmutter GPU nodes. I see this comment which gives me hope. I create a very simple conda environment, include the "external" mpich package, install mpi4py from conda-forge (and that's it!) and the NERSC system mpich library (cray-mpich-abi) is available and seems to run correctly on Perlmutter's CPU-only compute nodes. However, repeating the same test of the conda environment on Perlmutter's GPU nodes fails with an error about the GTL library. I have tried this with both NERSC's cudatoolkit module (11.7) and the cudatoolkit (11.7.0) installed from conda-forge - as one of those needs to be available. NERSC has a variety of comments about this specific issue with the GTL library and notes some env variables that need to be set - and I have made sure they are.
Meanwhile, I can set up another simple conda environment, and follow NERSC's instructions, skip the external mpich package from conda-forge, do this special pip install of mpi4py and it works on the Perlmutter GPU nodes. Something is different, whether or not mpi4py is cuda-aware. I'd really prefer to use the conda-forge external mpich library though and avoid this pip install step of mpi4py if possible. I'm probably missing something very basic, is there an example where the conda-forge mpich external package and mpi4py are set up in a conda environment and this works on the Perlmutter GPU compute nodes?
Hi @heather999, sorry to hear your frustration. Unfortunately I've left DOE for a while and lost access to NERSC, so can't test it myself right away, but it should be working based on my (distant) past experience and other users' feedback.
Judging from this statement
I create a very simple conda environment, include the "external" mpich package, install mpi4py from conda-forge (and that's it!) and the NERSC system mpich library (cray-mpich-abi) is available and seems to run correctly on Perlmutter's CPU-only compute nodes.
It doesn't seem to be any ABI compatibility issue since the empty mpich
+ mpi4py
from CF + Cray MPI works on CPU-only workloads. This statement alone is enough to say mpi4py
from CF is not the problem.
Now,
and I have made sure they are
would you confirm that both are set but not working?
export MPICH_GPU_SUPPORT_ENABLED=1
export CRAY_ACCEL_TARGET=nvidia80
If so, I guess I might have a theory. It seems setting CRAY_ACCEL_TARGET
would add a linker flag to cc
so that it knows which shared library to link to, but if you use mpi4py
from CF it's not linked to that. I think this is a design issue in Cray MPI: They should have linked libmpi.so
to the transport library for the user, or do a dlopen to load the library at runtime if the env var is set. Otherwise, they put burdens on users and you wouldn't be able to use prebuilt binary packages (such as CF's mpi4py).
I would suggest to ask NERSC support about which shared library to load, and try LD_PRELOAD
to force loading it. I believe this would fix the issue.
One idea occurred to me here. We might build our own copy of the mpi4py package in a local channel at NERSC with the correct linkages. Then we can tuck this into a higher priority channel so it gets pulled in first.
This is a strategy that has worked well in different contexts
One thing to keep in mind is how channels/labels get setup. Namely if there are other channels in use for say NERSC or DOE products, would recommend keeping them separate from say modified conda-forge
packages. Doing a little upfront work to set things up right can be a bit of a drag, but it beats trying to fix things later when people depend on them. Just something to keep in mind 😉
Ahhhh pro tip! Thanks! If you all think of anything else, let me know.
We might build our own copy of the mpi4py ... with the correct linkages.
@beckermr What do you exactly mean with this? What is linking is incorrect? All what should be needed is for libmpi.so.12
to be found in LD_LIBRARY_PATH
. Am I missing something?
The HPC center I work at has special compiler flags for linking mpich with CUDA awareness. They link some libs directly to mpi4py instead of to libmpi. So we would try to build a package there where we have access to the libs and can link things properly.
The HPC center I work at has special compiler flags for linking mpich with CUDA awareness.
Is this information available somewhere?
They link some libs directly to mpi4py instead of to libmpi.
Awful.
So we would try to build a package there where we have access to the libs and can link things properly.
Maybe there is way to create a libmpi.so.12
file that links to all the other MPI+CUDA stuff (as done currently with mpi4py), and then you point LD_LIBRARY_PATH to it.
Right. We have to do something custom. A recipe and local channel seems less painful than special libs+linking, but I am not 100% sure.
A recipe and local channel seems less painful than special libs+linking
IMHO, creating a specially crafted lib and a module file appending to LD_LIBRARY_PATH
for users to module load
is less painful (for users) than having to use special channels. Of course, I'm talking without knowing the specific details of the system.
Hmmmmmm. I had not considered using the module system. Maybe the right thing is to ask the admins to do the extra linkage for us if it works. I think Leo mentioned this. Thanks for the input!
Yeah agree with Lisandro. The downside of adding a package is it now needs to be maintained indefinitely (and who maintains it?). Adding some local machine configuration (module load
or otherwise) only needs to be maintained on that machine (and by people who do that maintenance). Something to consider
Apparently, you need a particular compiler & pip invocation to get CUDA support for mpi4py.
See this page: https://docs.nersc.gov/development/languages/python/using-python-perlmutter/#mpi4py-on-perlmutter
Do we think there is a way to enable this in the conda-forge build?
cc @jakirkham @leofang