ucx + cupy-core unexpectedly pulls in cuda-cudart

dmargala commented 5 months ago

Solution to issue cannot be found in the documentation.

[X] I checked the documentation.

Issue

I'm trying to create a conda environment with dask and cupy-core. I'm relying on a site installation to provide the cuda libraries needed at runtime so I want to avoid having cuda-cudart installed in the conda environment. cupy-core has a dependency on cuda-version which seems fine but when I add dask to the environment it ends up pulling in cuda-cudart from ucx via libarrow-flight/pyarrow/dask.

I think the following is sufficient to reproduce my issue (although the thing I actually care about is dask + cupy-core):

conda create -p /tmp/env -c conda-forge ucx cuda-version

Installing ucx on its own does not pull in cuda-cudart so it seems like the presence of the cuda-version package is triggering this behavior. I tried to specify a cpu-only build of ucx with "ucx==cpu" but the ucx version (1.6.1) seems a lot older than the latest available version so I suspect that may be old.

(I'm not sure this is a ucx problem, I suppose it could be related to how cupy-core or libarrow conda dependencies are specified but I figured I'd start here.)

Installed packages

# packages in environment at /tmp/env:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       2_gnu    conda-forge
cuda-cudart               12.4.127             hd3aeb46_0    conda-forge
cuda-cudart_linux-64      12.4.127             h59595ed_0    conda-forge
cuda-version              12.4                 h3060b56_3    conda-forge
libgcc-ng                 13.2.0               hc881cc4_6    conda-forge
libgomp                   13.2.0               hc881cc4_6    conda-forge
libnl                     3.9.0                hd590300_0    conda-forge
libstdcxx-ng              13.2.0               h95c4c6d_6    conda-forge
rdma-core                 51.0                 hd3aeb46_0    conda-forge
ucx                       1.16.0               h555b365_1    conda-forge

Environment info

active environment : /tmp/env
    active env location : /tmp/env
            shell level : 1
       user config file : /global/homes/d/dmargala/.condarc
 populated config files : /global/common/software/nersc/pe/conda/24.1.0/Miniconda3-py311_23.11.0-2/.condarc
                          /global/homes/d/dmargala/.condarc
          conda version : 23.11.0
    conda-build version : not installed
         python version : 3.11.5.final.0
                 solver : libmamba (default)
       virtual packages : __archspec=1=zen3
                          __conda=23.11.0=0
                          __cuda=12.2=0
                          __glibc=2.31=0
                          __linux=5.14.21=0
                          __unix=0=0
       base environment : /global/common/software/nersc/pe/conda/24.1.0/Miniconda3-py311_23.11.0-2  (read only)
      conda av data dir : /global/common/software/nersc/pe/conda/24.1.0/Miniconda3-py311_23.11.0-2/etc/conda
  conda av metadata url : None
           channel URLs : https://conda.anaconda.org/conda-forge/linux-64
                          https://conda.anaconda.org/conda-forge/noarch
                          https://repo.anaconda.com/pkgs/main/linux-64
                          https://repo.anaconda.com/pkgs/main/noarch
                          https://repo.anaconda.com/pkgs/r/linux-64
                          https://repo.anaconda.com/pkgs/r/noarch
          package cache : /tmp/tmp.H40EfbHuEO
       envs directories : /global/homes/d/dmargala/.conda/envs
                          /global/common/software/nersc/pe/conda/24.1.0/Miniconda3-py311_23.11.0-2/envs
               platform : linux-64
             user-agent : conda/23.11.0 requests/2.31.0 CPython/3.11.5 Linux/5.14.21-150400.24.81_12.0.87-cray_shasta_c sles/15.4 glibc/2.31 solver/libmamba conda-libmamba-solver/23.12.0 libmambapy/1.5.6
                UID:GID : 52610:52610
             netrc file : /global/homes/d/dmargala/.netrc
           offline mode : False

leofang commented 5 months ago

I can reproduce with

conda create -p /tmp/env -c conda-forge ucx cuda-version

I think something is wrong in ucx because cuda-cudart is listed as a dependency: 截圖 2024-04-23 下午11 51 42 Most likely we forgot to add an ignore_run_exports_from: cuda-cudart-dev since we have it explicitly listed in host: https://github.com/conda-forge/ucx-split-feedstock/blob/c7e9896aee697e7b4bd8c84b1f5903013b24ea7e/recipe/meta.yaml#L33

dmargala commented 5 months ago

Thanks for confirming!

I notice there are two builds (for linux-64):

linux-64/ucx-1.16.0-h624969c_1 -> constrains cuda-version >=11.2,<12, does not depend on cuda-cudart linux-64/ucx-1.16.0-h555b365_1 -> constrains cuda-version >=12,<13, depends on cuda-cudart

In case it's helpful, it looks like there was a change in behavior in #117:

Prior to that, this seems to work (does not pull in cuda-cudart):

conda create -p /tmp/env -c conda-forge cuda-version "ucx=1.14.1=*_1"

After that PR, things turn bad (pulls in cuda-cudart):

conda create -p /tmp/env -c conda-forge cuda-version "ucx=1.14.1=*_2"

leofang commented 5 months ago

yes, because cuda-cudart only exists starting CUDA 12, let's get it fixed

@conda-forge-admin, please rerender

conda-forge-webservices[bot] commented 5 months ago

Hi! This is the friendly automated conda-forge-webservice.

I just wanted to let you know that I started rerendering the recipe in conda-forge/ucx-split-feedstock#173.

leofang commented 5 months ago

Forgot to ask, @pentschev any chance you know if UCX by default links to cudart statically or dynamically?

leofang commented 5 months ago

nvm, answering myself based on the CI log: @dmargala this is expected because UCX dynamically links to CUDART. This is why from the CUDA 11 pipeline we see

...
2024-04-20T04:49:43.6422403Z WARNING (ucx,lib/ucx/libucx_perftest_cuda.so.0.0.0): $RPATH/libcudart.so.11.0 not found in packages, sysroot(s) nor the missing_dso_whitelist.
2024-04-20T04:49:43.6423726Z .. is this binary repackaging?
...
2024-04-20T04:49:43.7154660Z WARNING (ucx,lib/ucx/libuct_cuda.so.0.0.0): $RPATH/libcudart.so.11.0 not found in packages, sysroot(s) nor the missing_dso_whitelist.
2024-04-20T04:49:43.7155422Z .. is this binary repackaging?
...

and the CUDA 12 pipeline:

...
2024-04-20T04:48:26.5229533Z    INFO (ucx,lib/ucx/libuct_cuda.so.0.0.0): Needed DSO lib/libcudart.so.12 found in conda-forge/linux-64::cuda-cudart==12.0.107=hd3aeb46_8
...
2024-04-20T04:48:26.6609348Z    INFO (ucx,lib/ucx/libucx_perftest_cuda.so.0.0.0): Needed DSO lib/libcudart.so.12 found in conda-forge/linux-64::cuda-cudart==12.0.107=hd3aeb46_8
...

In CUDA 11, cudart comes from the cudatoolkit package, whereas in CUDA 12 it's from cuda-cudart package. So I think there is probably no need to make any change. Of course we can discuss if we should switch to static linking, but it's a separate discussion. Let me know if this answers your question.

dmargala commented 5 months ago

Hmm, it seems like the behavior is different in an unexpected way since the package for CUDA 11 does not add cudatoolkit as a dependency to provide cudart.

In any case, the thing I really want is a conda environment with dask and cupy without cudatoolkit/cuda-cudart, I would like to use a site installation outside of my conda env to provide cuda dependencies. I can ask for cupy-core but I'm not sure what to do on the dask -> pyarrow -> libarrow -> ucx chain of dependencies to prevent cuda-cudart from getting pulled in.

Testing with:

conda create -p /tmp/env -c conda-forge pyarrow cuda-version

I can see I am getting "cpu" builds of pyarrow but that is still pulling in an undesired cuda-cudart via ucx:

...
  cuda-cudart        conda-forge/linux-64::cuda-cudart-12.4.127-hd3aeb46_0
  cuda-cudart_linux~ conda-forge/noarch::cuda-cudart_linux-64-12.4.127-h59595ed_0
  cuda-version       conda-forge/noarch::cuda-version-12.4-h3060b56_3
...
  libarrow           conda-forge/linux-64::libarrow-15.0.2-h07fc4ce_5_cpu
  libarrow-acero     conda-forge/linux-64::libarrow-acero-15.0.2-hbabe93e_5_cpu
  libarrow-dataset   conda-forge/linux-64::libarrow-dataset-15.0.2-hbabe93e_5_cpu
  libarrow-flight    conda-forge/linux-64::libarrow-flight-15.0.2-hc4f8a93_5_cpu
  libarrow-flight-s~ conda-forge/linux-64::libarrow-flight-sql-15.0.2-he4f5ca8_5_cpu
  libarrow-gandiva   conda-forge/linux-64::libarrow-gandiva-15.0.2-hc1954e9_5_cpu
  libarrow-substrait conda-forge/linux-64::libarrow-substrait-15.0.2-he4f5ca8_5_cpu
...
  pyarrow            conda-forge/linux-64::pyarrow-15.0.2-py312h3f82784_5_cpu
...
  ucx                conda-forge/linux-64::ucx-1.15.0-hda83522_8
...

leofang commented 5 months ago

You're right. I missed that we didn't add cudatoolkit as a dependency on CUDA 11 (which was why warnings were raised by conda-build as shown above). So changes are still needed.

I see two ways out

Statically links to cudart, which is what you want (don't bring in anything CUDA)
Dynamically links to cudart, which requires adding cudatoolkit as a dependency on CUDA 11

I am in favor of 1 too but I'd like to ask our UCX expert @pentschev and @conda-forge/ucx-split too.

jakirkham commented 5 months ago

IIRC UCX uses dlopen with all of the transports. So if they can't be resolved, it disables them. This is also true of the CUDA transports

We softened cudatoolkit as it was a large dependency that was causing issues for users ( https://github.com/conda-forge/ucx-split-feedstock/issues/115 ). As most users have cudatoolkit or know to install it for CUDA 11, we assumed that users could handle this themselves.

Potentially we could soften the cuda-cudart dependency. The one wrinkle is with cuda-cudart most packages statically link it, so users are less likely to get it. Plus they may not know to install it themselves. So if we do this, we may want to communicate it to users somehow

Static linking in the past led to some unpleasant issues. Believe these got fixed with PR ( https://github.com/openucx/ucx/pull/6038 ). That said, @pentschev would likely know the implications of static linking here better than I

pentschev commented 5 months ago

Statically links to cudart, which is what you want (don't bring in anything CUDA)

Just for the sake of completeness, UCX does not support static linkage to cudart.

IIRC UCX uses dlopen with all of the transports. So if they can't be resolved, it disables them. This is also true of the CUDA transports

This is correct.

Potentially we could soften the cuda-cudart dependency. The one wrinkle is with cuda-cudart most packages statically link it, so users are less likely to get it. Plus they may not know to install it themselves. So if we do this, we may want to communicate it to users somehow

Making cuda-cudart optional is the only way to resolve this issue AFAIU. The implication, as John mentions, is that if the user forgets to install cuda-cudart then UCX will have GPU capabilities disabled and will fail when trying to transfer GPU objects.

Static linking in the past led to some unpleasant issues. Believe these got fixed with PR ( openucx/ucx#6038 ). That said, @pentschev would likely know the implications of static linking here better than I

We recently asked about this offline and the answer we got from UCX devs is that this isn't supported, so we shouldn't be doing that.

dmargala commented 5 months ago

Potentially we could soften the cuda-cudart dependency. The one wrinkle is with cuda-cudart most packages statically link it, so users are less likely to get it. Plus they may not know to install it themselves. So if we do this, we may want to communicate it to users somehow

I would certainly appreciate the flexibility to opt-out. I agree many users likely would likely miss the opportunity to opt-in. Could it make sense to provide a cpu only variant of ucx?

Making cuda-cudart optional is the only way to resolve this issue AFAIU. The implication, as John mentions, is that if the user forgets to install cuda-cudart then UCX will have GPU capabilities disabled and will fail when trying to transfer GPU objects.

FWIW, I don't think I can do much with the GPU capabilities in UCX on a system that does not support infiniband/ibverbs.

I would also guess due to the existence "_cpu" and "_cuda" builds of pyarrow that the "_cpu" build isn't using the GPU capabilities in UCX.

pentschev commented 5 months ago

FWIW, I don't think I can do much with the GPU capabilities in UCX on a system that does not support infiniband/ibverbs.

Any CUDA memory transfers over UCX, including TCP/shared memory, you still need UCX to be able to dlopen cudart, it's not limited to specialized hardware such as InfiniBand or NVLink.

leofang commented 5 months ago

Potentially we could soften the cuda-cudart dependency.

If by this @jakirkham meant to add cuda-cudart (CUDA 12) and cudatoolkit (CUDA 11) to run_constrained, I think this is the way to go based on what @pentschev shared, since static linking is a dead end but we still need to make improvement to serve both CPU and GPU users.

jakirkham commented 5 months ago

Yep exactly

jakirkham commented 5 months ago

As to CUDA and CPU variants of the package, we could do this and have done this in the past. It is a bit of a mixed bag. At the end of the day a user still needs to know to install the CUDA variant

Would lean towards keeping one package and softening the dependency

In terms of communication, maybe we can add an install time message and a README in the repo

leofang commented 5 months ago

We can additionally add post link message like what we did in Open MPI: https://github.com/conda-forge/openmpi-feedstock/blob/main/recipe/post-link-cuda.sh

leofang commented 5 months ago

Interesting I missed the link on Prelink Message File. Didn't know it exists.

jakirkham commented 5 months ago

Yeah a post-link script is also an option. So just a question of when the message is emitted and whether a script is used

Right prelink_message was added to supersede pre-link scripts used for this purpose. It could just be used for messages in general

No strong feelings amongst these. There is an interest in cutting down the number of scripts in packages that run at install time. So prelink_message meets this. That said, we can do whichever bests fits our needs

leofang commented 5 months ago

Thanks all, I'll integrate all this into #173 and ping you for review tonight or tomorrow.

leofang commented 5 months ago

(I wonder why there's no postlink_message?)

jakirkham commented 5 months ago

prelink_message was added for a few reasons:

A common need is for packages to tell users something before install (like deprecating a package, warning about particular issues, providing additional instructions, clarify usage terms, etc.)
Due to the issues above there can be a need to confirm it is ok to continue (or provide the option for the user to stop and do something different)
Providing a path to deprecate and remove support for pre-link scripts (which run arbitrary code before the package is fully installed)

More details in issue: https://github.com/conda/conda/issues/10118

Once a package is installed, the horse is out of the barn

Potentially there can be other cases where it may make sense to have a message after. Though having a message at the end implies this is not a big deal and users can figure things out if needed

jakirkham commented 5 months ago

Looks like packages are up, but may still be mirroring to CDN

Please let us know how things go

dmargala commented 5 months ago

Thanks all, I appreciate the help sorting this out.

A quick look at the env with my simplified test looks good to me:

conda create -p /tmp/env -c conda-forge ucx cuda-version
...
The following NEW packages will be INSTALLED:

  _libgcc_mutex      conda-forge/linux-64::_libgcc_mutex-0.1-conda_forge
  _openmp_mutex      conda-forge/linux-64::_openmp_mutex-4.5-2_gnu
  cuda-version       conda-forge/noarch::cuda-version-12.4-h3060b56_3
  libgcc-ng          conda-forge/linux-64::libgcc-ng-13.2.0-hc881cc4_6
  libgomp            conda-forge/linux-64::libgomp-13.2.0-hc881cc4_6
  libnl              conda-forge/linux-64::libnl-3.9.0-hd590300_0
  libstdcxx-ng       conda-forge/linux-64::libstdcxx-ng-13.2.0-h95c4c6d_6
  rdma-core          conda-forge/linux-64::rdma-core-51.0-hd3aeb46_0
  ucx                conda-forge/linux-64::ucx-1.16.0-h555b365_2
...

I also looked at the actual env that I wanted which specified dask and cupy-core. That seems to side-step the issue now due to recent activity in the cupy-core packaging which no longer pulls in cuda-version. The libarrow-flight package in that env seems to be pinned to an earlier version ucx 1.15.0 [required: >=1.15.0,<1.16.0a0] anyway so I may have to wait a bit for this to propagate through.

jakirkham commented 5 months ago

Yeah something weird is happening there

Raised upstream issue: https://github.com/regro/cf-scripts/issues/2519

conda-forge / ucx-split-feedstock