NVIDIA / tensorflow

An Open Source Machine Learning Framework for Everyone
https://developer.nvidia.com/deep-learning-frameworks
Apache License 2.0
962 stars 144 forks source link

Tensorflow 22.02-tf2-py3 AMD64 does not have static cuda libraries (libxxx_static.a) #72

Closed raghavduddala closed 1 year ago

raghavduddala commented 1 year ago

System information

Describe the problem I am trying to build Open3D from source using the 22.02-tf2-py3 AMD64 (architecture)as the base image. The Open3d-ML requires static cuda libraries like the libcusolver_static.a, libcublas_static etc.... but the current tensorflow container only has the dynamic cuda libraries like libcusolver.so , libcublas.so etc..

Provide the exact sequence of commands / steps that you executed before running into the problem So I checked the cuda libraries installed and only find the dynamic .so for mots of them.

root@raghav-GS75-Stealth-9SF:/usr/local/cuda/lib64# ls
cmake                     libcudadevrt.a          libcupti.so                 liblapack_static.a       libnppif.so.11          libnppitc.so.11         libnvperf_target.so
libOpenCL.so              libcudart.so            libcupti.so.11.6            libmetis_static.a        libnppif.so.11.6.0.55   libnppitc.so.11.6.0.55  libnvrtc-builtins.so
libOpenCL.so.1            libcudart.so.11.0       libcupti.so.2022.1.0        libnppc.so               libnppig.so             libnpps.so              libnvrtc-builtins.so.11.6
libOpenCL.so.1.0          libcudart.so.11.6.55    libcurand.so                libnppc.so.11            libnppig.so.11          libnpps.so.11           libnvrtc-builtins.so.11.6.55
libOpenCL.so.1.0.0        libcudart_static.a      libcurand.so.10             libnppc.so.11.6.0.55     libnppig.so.11.6.0.55   libnpps.so.11.6.0.55    libnvrtc.so
libaccinj64.so            libcufft.so             libcurand.so.10.2.9.55      libnppial.so             libnppim.so             libnvToolsExt.so        libnvrtc.so.11.2
libaccinj64.so.11.6       libcufft.so.10          libcusolver.so              libnppial.so.11          libnppim.so.11          libnvToolsExt.so.1      libnvrtc.so.11.6.55
libaccinj64.so.11.6.55    libcufft.so.10.7.0.55   libcusolver.so.11           libnppial.so.11.6.0.55   libnppim.so.11.6.0.55   libnvToolsExt.so.1.0.0  libpcsamplingutil.so
libcheckpoint.so          libcufftw.so            libcusolver.so.11.3.2.55    libnppicc.so             libnppist.so            libnvblas.so            stubs
libcublas.so              libcufftw.so.10         libcusolverMg.so            libnppicc.so.11          libnppist.so.11         libnvblas.so.11
libcublas.so.11           libcufftw.so.10.7.0.55  libcusolverMg.so.11         libnppicc.so.11.6.0.55   libnppist.so.11.6.0.55  libnvblas.so.11.8.1.74
libcublas.so.11.8.1.74    libcuinj64.so           libcusolverMg.so.11.3.2.55  libnppidei.so            libnppisu.so            libnvjpeg.so
libcublasLt.so            libcuinj64.so.11.6      libcusparse.so              libnppidei.so.11         libnppisu.so.11         libnvjpeg.so.11
libcublasLt.so.11         libcuinj64.so.11.6.55   libcusparse.so.11           libnppidei.so.11.6.0.55  libnppisu.so.11.6.0.55  libnvjpeg.so.11.6.0.55
libcublasLt.so.11.8.1.74  libculibos.a            libcusparse.so.11.7.1.55    libnppif.so              libnppitc.so            libnvperf_host.so

Any other info / logs

#6 134.1 CMake Error at 3rdparty/find_dependencies.cmake:1454 (target_link_libraries):
#6 134.1   The link interface of target "3rdparty_cublas" contains:
#6 134.1 
#6 134.1     CUDA::cusolver_static
#6 134.1 
#6 134.1   but the target was not found.  Possible reasons include:
#6 134.1 
#6 134.1     * There is a typo in the target name.
#6 134.1     * A find_package call is missing for an IMPORTED target.
#6 134.1     * An ALIAS target is missing.
#6 134.1 
#6 134.1 Call Stack (most recent call first):
#6 134.1   CMakeLists.txt:465 (include)
#6 134.1 

Following is the cmkae file that uses the find_package toolkit from CMake to find the static cuda targets : https://github.com/isl-org/Open3D/blob/v0.15.1/3rdparty/find_dependencies.cmake#L1454

So, is there any possibility of adding those missing static libraries or any way of installing them on to this container? Also any reason they are available in the cudnn-devel containers and not in the tensorflow container?

nluehr commented 1 year ago

The static libraries are intentionally removed from our NGC container images in order to reduce the size of the images. There is no automatic way to install the static images, but the following should more or less do what you want.

First, install the NVIDIA developer Debian package repo.

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-keyring_1.0-1_all.deb
dpkg -i cuda-keyring_1.0-1_all.deb
apt update

Then, check https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/ for "dev" package versions similar to those installed in your container and install them with apt. For 22.02-tf2, you would likely want the following.

apt install --reinstall --no-install-recommends \
    libcublas-dev-11-8=11.11.3.6-1 \
    libcudnn8-dev=8.3.2.44-1+cuda11.5 \
    libcufft-dev-11-6=10.7.0.55-1 \
    libcurand-dev-11-6=10.2.9.55-1 \
    libcusolver-dev-11-6=11.3.2.55-1 \
    libcusparse-dev-11-6=11.7.1.55-1
    # And so on for additional libraries needed.

The above will also re-install some packages that you don't strictly need. You can alternatively, download the "dev" deb packages and manually extract the static libs if maintaining a small image is important.

raghavduddala commented 1 year ago

Thanks @nluehr , that helped install the required cuda static libraries.