Closed shankar1729 closed 3 years ago
Hi! This is the friendly automated conda-forge-linting service.
I just wanted to let you know that I linted all conda-recipes in your PR (recipe
) and found it was in an excellent condition.
Edited OP to note issues fixed. Hope that is ok :)
Also I think we need the --with-ucx
flag in build.sh
’s call to ./configure
Thanks Ravishankar! 😀
Great! Hope it's useful. OpenMPI's configure seems to enable UCX automatically if it finds the headers. So I didn't need to add --with-ucx since its headers and libs get pulled into the standard path in the build environment.
Seems fine to me: I'm new to conda packaging though, so I'm not yet familiar with how to test these properly.
No worries and thanks again for working on this 🙂
@shankar1729 @jakirkham Thanks. I didn't know UCX is available on Conda-Forge. A couple questions:
opal_cuda_support
to turn it on?I am testing this locally and can only get CUDA awareness work by setting UCX_MEMTYPE_CACHE=n
UCX_MEMTYPE_CACHE=n mpirun -n 4 ./mpi_hello_world_cuda
Seems to be documented here.
I am in favor of getting the ucx support in, but do not enable it by default. @shankar1729 Do you mind if I push to your branch? I'll make necessary changes (setting default mca parameters, add user instructions, etc) following the same treatment for plain (non-UCX) CUDA awareness.
Responded in issue ( https://github.com/conda-forge/ucx-split-feedstock/issues/66 ). Though it will require some more investigation
Should add am in the middle of other work. So it might take a bit before I have bandwidth to look into this
@leofang Yes, please do use my branch to stage these changes.
Also, with respect to the cuda builds of UCX, it seems the build recipe has been modified to support it, but the released versions on conda-forge are without cuda.
Consequently, in my local tests I need to bypass the ucx pml when using GPU memory. I have always run my gpu codes with --mca pml ob1 when using openmpi (else I get a segfault in libucp.so), so ignore my response to point 3 above. With the current released ucx package linked to openmpi, openmpi bypasses ucx because I specified this switch in my tests.
Best, Shankar
Pending conda-forge/ucx-split-feedstock#89.
It's just in, but I'll likely have to resume on Friday or later.
Debugging the dependency conflict using mambabuild...
@conda-forge-admin, please restart ci
Why does this need multiple builds? Can't this be one single build as it used to be?
Why does this need multiple builds? Can't this be one single build as it used to be?
So for CUDA it needs two builds: 9.2 and 10.0+. It is because UCX changes the internal code for CUDA 10.0+, see the discussion around https://github.com/conda-forge/ucx-split-feedstock/issues/66#issuecomment-813944211.
As for the additional pure CPU build, my reasoning is we use ucx-proc =*=gpu
as one of the host dependency, which I am not sure if it would cause problems at runtime (say because we build with GPU-enabled UCX but run with GPU-disabled UCX), so I wanted to stay on the safe side until https://github.com/conda-forge/ucx-split-feedstock/issues/66 is fixed. But perhaps I worry too much.
It is because UCX changes the internal code for CUDA 10.0+, see the discussion around conda-forge/ucx-split-feedstock#66 (comment).
That is internal code and the public interface is the same, so I don't see any reason to have multiple builds for CUDA version.
As for the additional pure CPU build, my reasoning is we use ucx-proc =*=gpu as one of the host dependency, which I am not sure if it would cause problems at runtime (say because we build with GPU-enabled UCX but run with GPU-disabled UCX), so I wanted to stay on the safe side until conda-forge/ucx-split-feedstock#66 is fixed. But perhaps I worry too much.
That just shows that the public interface is the same, so there's no need for openmpi to have a dependence on GPU enabled or disabled UCX.
That is internal code and the public interface is the same, so I don't see any reason to have multiple builds for CUDA version.
I don't think it is true. The CUDA runtime must have the new cudaLaunchHostFunc API supported.
I don't think it is true. The CUDA runtime must have the new cudaLaunchHostFunc API supported.
That's an internal detail of ucx and openmpi doesn't care.
I don't think it is true. The CUDA runtime must have the new cudaLaunchHostFunc API supported.
That's an internal detail of ucx and openmpi doesn't care.
@jakirkham @pentschev I think @isuruf made a convincing argument here. I think we are guarded by ucx
handing the CUDA dependency for us. I am under the impression that the handling of CUDA GPU buffers in Open MPI is mutually exclusive --- it goes through solely on either ucx
or smcuda
, so it should not cause issues if at build time Open MPI sees a different CUDA version from what UCX was built on. I will restore to the single build as Isuru suggested.
Copying @jsquyres in case we missed something 😅
I confirmed that for all ucx
builds (cpu & gpu with different CUDA versions) their headers are identical, so ucx
didn't do some code generation magic that writes the build-time info (say, CUDA_VERSION
) into its headers, so we should be good.
Yeah, I agree this is a UCX implementation detail, therefore the OpenMPI package doesn't need to worry about that.
@conda-forge/openmpi this is ready!
FYI -- the long-awaited DDT fix came in for v4.1.x last night (https://github.com/open-mpi/ompi/pull/8837) -- we're making a new 4.1.1RC right now, and may well release 4.1.1 as early as tomorrow. Just in case this influences your thinking...
Thanks for sharing the good news, Jeff! Yeah we will get this rebuild (of v4.1.0) in, and then build v4.1.1 afterwards so it continues to have ucx
optionally enabled.
Let's get it in and see how it goes! Thanks @shankar1729 @jakirkham @pentschev @isuruf @jsquyres!
I have verified locally with a very simple MPI code that the package is working. The following combination is tested:
mpirun -n 2 ./my_test
)mpirun -n 2 --mca pml ucx --mca osc ucx ./my_test
)mpirun -n 2 --mca opal_cuda_support 1 ./my_gpu_test
)mpirun -n 2 --mca pml ucx --mca osc ucx ./my_gpu_test
)For 4, unlike a local build that I tested with, for some reason it just works out of box without setting UCX_MEMTYPE_CACHE=n
.
tl;dr: we are cool
Checklist
Fixes #42 Fixes #38 (missing openib support). Instead of enabling ibverbs, which is marked as deprecated within openmpi anyway, enabling ucx support is much easier and cleaner as the ucx package is already available on conda-forge.
Adding ucx as a dependency (during build and run) automatically gets openmpi to include support for a number of high-speed interconnects (Infiniband, Omnipath etc.). No change to the build scripts beyond the dependency in meta.yaml!