conda-forge / gpaw-feedstock

A conda-smithy repository for gpaw.
BSD 3-Clause "New" or "Revised" License
7 stars 7 forks source link

Could you please add UCX? #31

Closed vladislavivanistsev closed 2 years ago

vladislavivanistsev commented 2 years ago

Comment:

By default openmpi failed to run with ROSE – RDMA over Converged Ethernet. Installing Unified Communication X (UCX) and adding "--mca btl_openib_rroce_enable 1" to the mpirun command, fixes the problem. How about adding UCX to the GPAW feedstock?

gdonval commented 2 years ago

I understand how frustrating it can be to not have gpaw fully working from the get go.

Choosing to add a dependency out of convenience in a subproject is bound to create problems: there are reasons why openmpi does not pull ucx as a hard dependency and I think we should honour that for the very same reasons.

In this instance, I would suggest to get in touch with the openmpi-feedstock crowd to fully understand their choice (also ucx is their dependency, not gpaw's!) and if they don't agree to add ucx as a hard dependency, it might be worthwhile to update gpaw's documentation to indicate that ucx might be needed.

Just to be clear: this also affects me for the very same reason so don't think I am simply being dismissive.

vladislavivanistsev commented 2 years ago

@gdonval Agree with the explanation. In fact, an informative message appears when installing openmpi:

In addition, the UCX support is also built but disabled by default. To enable it, first install UCX (conda install -c conda-forge ucx). Then, set the environment variables OMPI_MCA_pml="ucx" OMPI_MCA_osc="ucx" before launching your MPI processes. Equivalently, you can set the MCA parameters in the command line: mpiexec --mca pml ucx --mca osc ucx ... Note that you might also need to set UCX_MEMTYPE_CACHE=n for CUDA awareness via UCX. Please consult UCX's documentation for detail.

vladislavivanistsev commented 2 years ago

Here is a discussion about UCX in regard to HPC: https://github.com/conda-forge/openmpi-feedstock/pull/87