Open jakirkham opened 11 months ago
All fixes have been merged. Closing as completed
Reopening to look at bin
where it appears similar work may be needed
cc @billysuh7 (to look at doing the same thing for binaries in a couple weeks)
Billy looked through the feedstocks and found the following ones still need RPATH fixes:
cuda-cuobjdump
: https://github.com/conda-forge/cuda-cuobjdump-feedstock/pull/15cuda-cuxxfilt
: https://github.com/conda-forge/cuda-cuxxfilt-feedstock/pull/14cuda-gdb
: https://github.com/conda-forge/cuda-gdb-feedstock/pull/17cuda-nvcc-impl
: https://github.com/conda-forge/cuda-nvcc-impl-feedstock/pull/29cuda-nvdisasm
: https://github.com/conda-forge/cuda-nvdisasm-feedstock/pull/16cuda-nvml-dev
: https://github.com/conda-forge/cuda-nvml-dev-feedstock/pull/16cuda-nvprune
: https://github.com/conda-forge/cuda-nvprune-feedstock/pull/14cuda-nvvp
: https://github.com/conda-forge/cuda-nvvp-feedstock/pull/14cuda-opencl
~ (decided this is unneeded)cuda-sanitizer-api
: https://github.com/conda-forge/cuda-sanitizer-api-feedstock/pull/18gds-tools
: https://github.com/conda-forge/libcufile-feedstock/pull/22nsight-compute
Introduction
We recently became aware of an issue in the cuda-nvtx-feedstock where the RPATHs in the libraries in the package were incorrect ( https://github.com/conda-forge/cuda-nvtx-feedstock/issues/2 ). These incorrect RPATHs are the result of the directory layout used for CUDA packages. All distributions of CUDA place their contents in a top-level
targets
directory with various subdirectories for different architectures to better support cross-compilation. The CUDA packages on conda-forge mimic this structure, but to support standard runtime library use cases, the library contents of CUDA packages are symlinked into the top-levellib
directory. The problem is that due to how$ORIGIN
is handled for symlinks, the RPATHs are set relative to the true library location at build time, but at runtime$ORIGIN
is the location of the symlink rather than the true library location, and as a result at runtime the RPATHs result in package searches outside of the environment.We would like to maintain the
targets
layout because it matches how CUDA is provided in other distributions. This also means we want to keep the real libraries in thetargets
directory rather than placing them directly inlib
. We would also like to avoid ballooning the package size or adding any RPATHs that point outside the environment since that is broken at best and dangerous at worst. To satisfy all of these constraints, our proposed solution is to manually set the RPATH to$ORIGIN
with patchelf during the conda package build step on all the libraries in thetargets
directory. At runtime, the RPATH setting of$ORIGIN
will resolve to$PREFIX/lib
, producing the desired behavior. There are some potential caveats to how this may work within the context of conda-build, as we discuss below, but we have verified that this produces the desired runtime results.Problem Statement
-dev
package.cuda-nvtx-dev
&cuda-nvtx
. The runtime package,cuda-nvtx
, contains the libraries.-dev
package has a dependency on the runtime package so that these libraries are available at build time.$PREFIX/targets/<arch>/*.so*
.$PREFIX/lib/*.so*
.so
files in the deeper folder,$PREFIX/targets/<arch>/lib
.RPATH
to be$ORIGIN/../../../lib
.$PREFIX/lib
.$PREFIX/targets/<arch>/lib
is loaded.RPATH
is$ORIGIN/../../../lib
$ORIGIN
is considered to be$PREFIX/lib
This can result in a functioning environment, if either:
envs
folderOr:
LD_LIBRARY_PATH
or standardld.so
search paths.If either of those cases are not met, the environment will not be functional.
Our Solution
targets/…/lib
lib
for each CUDA library that points to the library in../targets/<arch>/…
patchelf
to setRPATH
to$ORIGIN
for libraries$PREFIX/targets/<arch>/*.so*
build: binary_relocation: false
so that conda-build doesn’t otherwise change the RPATHs of these libraries$PREFIX/lib
will look for libraries adjacent to the symlink in$PREFIX/lib
.libstdc++
, instead of the systemlibstdc++
.targets/…/lib
folder.$PREFIX/lib
.Justification
This approach aligns more closely with how the CUDA Toolkit is distributed outside of conda than the alternatives we considered below. It also avoids unnecessarily bloating the package.
Considered Alternatives
Reverse symlink direction
targets/…/lib
, the actual library file would be placed in$PREFIX/lib
, and the symlink would be created in thetargets/…/lib
folderRPATH
to$ORIGIN/../lib
$PREFIX/lib
location.Comments
This approach would result in a different CUDA Toolkit layout in Conda compared to other distributions. Alignment across CUDA Toolkit distributions is important for libraries using CUDA to have similar expectations and behaviors both inside and outside of conda environments.
Duplicating library in both locations
*-dev
and the runtime packages. It would exist in thetargets/…/lib
location in the*-dev
package, and in$PREFIX/lib
in the runtime package.RPATH
in both instances.*-dev
package would have anRPATH
of$ORIGIN/../../../lib
, which evaluates to$PREFIX/lib
. Loading of sibling libraries in thetargets
folder would rely on fallback toRUNPATH
, which is$ORIGIN
.RPATH
of$ORIGIN/../lib
, which again evaluates to$PREFIX/lib
. Sibling libraries are present in this same folder in this package, so the fallback toRUNPATH
doesn’t come into play.Comments
The
cuda
metapackage makes the assumption that both build-time and run-time components are provided. Because we duplicate libraries in these packages between the-devel
and runtime packages, the effective size of thecuda
metapackage would be roughly doubled. This is prohibitive. Additionally, having-dev
and-runtime
variants of a metapackage is not favorable, because it would differ from other ways of distributing CUDA.