conda-forge / cuda-feedstock

A conda-smithy repository for cuda.
BSD 3-Clause "New" or "Revised" License
2 stars 8 forks source link

Fix for CUDA Toolkit packages containing incorrect RPATH #10

Open jakirkham opened 9 months ago

jakirkham commented 9 months ago

Introduction

We recently became aware of an issue in the cuda-nvtx-feedstock where the RPATHs in the libraries in the package were incorrect ( https://github.com/conda-forge/cuda-nvtx-feedstock/issues/2 ). These incorrect RPATHs are the result of the directory layout used for CUDA packages. All distributions of CUDA place their contents in a top-level targets directory with various subdirectories for different architectures to better support cross-compilation. The CUDA packages on conda-forge mimic this structure, but to support standard runtime library use cases, the library contents of CUDA packages are symlinked into the top-level lib directory. The problem is that due to how $ORIGIN is handled for symlinks, the RPATHs are set relative to the true library location at build time, but at runtime $ORIGIN is the location of the symlink rather than the true library location, and as a result at runtime the RPATHs result in package searches outside of the environment.

We would like to maintain the targets layout because it matches how CUDA is provided in other distributions. This also means we want to keep the real libraries in the targets directory rather than placing them directly in lib. We would also like to avoid ballooning the package size or adding any RPATHs that point outside the environment since that is broken at best and dangerous at worst. To satisfy all of these constraints, our proposed solution is to manually set the RPATH to $ORIGIN with patchelf during the conda package build step on all the libraries in the targets directory. At runtime, the RPATH setting of $ORIGIN will resolve to $PREFIX/lib, producing the desired behavior. There are some potential caveats to how this may work within the context of conda-build, as we discuss below, but we have verified that this produces the desired runtime results.

Problem Statement

This can result in a functioning environment, if either:

  1. The environment is not the base environment
  2. The environment is contained within the base environment’s envs folder
  3. The base environment contains compatible libraries

Or:

  1. The compatible libraries are accessible via LD_LIBRARY_PATH or standard ld.so search paths.

If either of those cases are not met, the environment will not be functional.

Our Solution

Justification

This approach aligns more closely with how the CUDA Toolkit is distributed outside of conda than the alternatives we considered below. It also avoids unnecessarily bloating the package.

Considered Alternatives

Reverse symlink direction

Comments

This approach would result in a different CUDA Toolkit layout in Conda compared to other distributions. Alignment across CUDA Toolkit distributions is important for libraries using CUDA to have similar expectations and behaviors both inside and outside of conda environments.

Duplicating library in both locations

Comments

The cuda metapackage makes the assumption that both build-time and run-time components are provided. Because we duplicate libraries in these packages between the -devel and runtime packages, the effective size of the cuda metapackage would be roughly doubled. This is prohibitive. Additionally, having -dev and -runtime variants of a metapackage is not favorable, because it would differ from other ways of distributing CUDA.

jakirkham commented 9 months ago

All fixes have been merged. Closing as completed

jakirkham commented 1 month ago

Reopening to look at bin where it appears similar work may be needed

jakirkham commented 1 month ago

cc @billysuh7 (to look at doing the same thing for binaries in a couple weeks)