Open chichivica opened 5 years ago
@chichivica I did it in my repository and uploaded the image to my dockerhub you can use it by following command:
docker run --runtime nvidia --rm -ti 69guitar1015/nvtop
@chichivica, you forgot to remove the stub .so
symlinks after building in the Dockerfile.
I was able to build the current nvtop version with the following Dockerfile:
FROM nvidia/cuda
RUN apt-get update && \
apt-get install -y cmake libncurses5-dev libncursesw5-dev git && \
rm -rf /var/lib/apt/lists/*
RUN ln -s /usr/local/cuda-10.1/targets/x86_64-linux/lib/stubs/libnvidia-ml.so /usr/local/lib/libnvidia-ml.so && \
ln -s /usr/local/cuda-10.1/targets/x86_64-linux/lib/stubs/libnvidia-ml.so /usr/local/lib/libnvidia-ml.so.1 && \
cd /tmp && \
git clone https://github.com/Syllo/nvtop.git && \
mkdir -p nvtop/build && cd nvtop/build && \
cmake .. && \
make && \
make install && \
cd / && \
rm -r /tmp/nvtop && \
rm /usr/local/lib/libnvidia-ml.so && \
rm /usr/local/lib/libnvidia-ml.so.1
ENTRYPOINT ["/usr/local/bin/nvtop"]
Thanks @RuRo. It's worked
I'm trying to do this in conjunction with the tensorflow dockerfile and it isn't working.
The problem seems to be that libnvidia-ml is in a different location: /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1 /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.430.50
I tried modifying the dockerfile as follows, but no luck.
FROM tensorflow/tensorflow:2.2.0rc3-gpu
RUN apt-get update && apt-get install -y --no-install-recommends \
bzip2 ca-certificates libglib2.0-0 libxext6 libsm6 libxrender1 \
libsox-fmt-all sox libsox-dev \
tmux zsh vim wget git \
nano google-perftools \
cmake libncurses5-dev libncursesw5-dev
RUN ln -s /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1 /usr/local/lib/libnvidia-ml.so && \
ln -s /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1 /usr/local/lib/libnvidia-ml.so.1 && \
cd /tmp && \
git clone https://github.com/Syllo/nvtop.git && \
mkdir -p nvtop/build && cd nvtop/build && \
cmake .. -DNVML_RETRIEVE_HEADER_ONLINE=True && \
make && \
make install && \
cd / && \
rm -r /tmp/nvtop && \
rm /usr/local/lib/libnvidia-ml.so && \
rm /usr/local/lib/libnvidia-ml.so.1
I get:
CMake Error at /usr/share/cmake-3.10/Modules/FindPackageHandleStandardArgs.cmake:137 (message):
Could NOT find NVML (missing: NVML_INCLUDE_DIRS)
Call Stack (most recent call first):
/usr/share/cmake-3.10/Modules/FindPackageHandleStandardArgs.cmake:378 (_FPHSA_FAILURE_MESSAGE)
cmake/modules/FindNVML.cmake:52 (FIND_PACKAGE_HANDLE_STANDARD_ARGS)
CMakeLists.txt:31 (find_package)
If I add the option -DNVML_RETRIEVE_HEADER_ONLINE=True
, I get:
make[2]: *** No rule to make target '/usr/local/lib/libnvidia-ml.so', needed by 'src/nvtop'. Stop.
make[1]: *** [src/CMakeFiles/nvtop.dir/all] Error 2
Any ideas?
@lminer you don't need the real libnvidia-ml.so
file, you need the stubs. AFAIK, attempting to use the actual Nvidia shared objects during docker build
will always fail, because the shared objects shouldn't actually be inside the container. Instead, they are mounted by the Nvidia Runtime from the host (you can tell by the driver version 430.50
in the so filename). docker build
doesn't use the Nvidia Runtime by default, so the actual so files won't be there during the build.
It seems, that the tensorflow folks decided that they will use the nvidia/cuda:*-base-*
images, which only have the bare minimum required to use GPUs and that they will provide every build dependency themselves. The base
and runtime
images don't have any stubs, so you are out of luck.
You'll either have to build tensorflow on your own with nvidia/cuda:*-devel-*
as a base image or to provide your own stub so files. Well, maybe I am missing some third option, but eh.
@RuRo Thanks for such a comprehensive explainer. I'll give that a shot!
@RuRo
I was able to build the current nvtop version with the following Dockerfile:
FROM nvidia/cuda RUN apt-get update && \ apt-get install -y cmake libncurses5-dev libncursesw5-dev git && \ rm -rf /var/lib/apt/lists/* RUN ln -s /usr/local/cuda-10.1/targets/x86_64-linux/lib/stubs/libnvidia-ml.so /usr/local/lib/libnvidia-ml.so && \ ln -s /usr/local/cuda-10.1/targets/x86_64-linux/lib/stubs/libnvidia-ml.so /usr/local/lib/libnvidia-ml.so.1 && \ cd /tmp && \ git clone https://github.com/Syllo/nvtop.git && \ mkdir -p nvtop/build && cd nvtop/build && \ cmake .. && \ make && \ make install && \ cd / && \ rm -r /tmp/nvtop && \ rm /usr/local/lib/libnvidia-ml.so && \ rm /usr/local/lib/libnvidia-ml.so.1 ENTRYPOINT ["/usr/local/bin/nvtop"]
Thank you for providing your Dockerfile
.
I changed base image from nvidia/cuda
to nvidia/cuda:10.1-devel-ubuntu16.04
, successfully built image and when I run it, I get following error:
/usr/local/bin/nvtop: error while loading shared libraries: libnvidia-ml.so.1: cannot open shared object file: No such file or directory
Edit:
Oops. Forgot about --runtime nvidia
.
Trying this with cuda 11.0 and am running into issues again. Now the stub files aren't present. Is there something that I should be installing that I haven't installed?
Basically /usr/local/cuda-11.0/targets/x86_64-linux/lib/stubs/libnvidia-ml.so
doesn't exist and I get
CMake Error at /usr/share/cmake-3.10/Modules/FindPackageHandleStandardArgs.cmake:137 (message):
Could NOT find NVML (missing: NVML_INCLUDE_DIRS)
Call Stack (most recent call first):
/usr/share/cmake-3.10/Modules/FindPackageHandleStandardArgs.cmake:378 (_FPHSA_FAILURE_MESSAGE)
cmake/modules/FindNVML.cmake:52 (FIND_PACKAGE_HANDLE_STANDARD_ARGS)
CMakeLists.txt:31 (find_package)
Here's the dockerfile
ARG UBUNTU_VERSION=18.04
ARG ARCH=
ARG CUDA=11.0
FROM nvidia/cuda${ARCH:+-$ARCH}:${CUDA}-base-ubuntu${UBUNTU_VERSION} as base
# ARCH and CUDA are specified again because the FROM directive resets ARGs
# (but their default value is retained if set previously)
ARG ARCH
ARG CUDA
ARG CUDNN=8.0.4.30-1
ARG CUDNN_MAJOR_VERSION=8
ARG LIB_DIR_PREFIX=x86_64
ARG LIBNVINFER=7.1.3-1
ARG LIBNVINFER_MAJOR_VERSION=7
# Needed for string substitution
SHELL ["/bin/bash", "-c"]
RUN apt-get update && apt-get install -y --no-install-recommends \
apt-utils \
build-essential \
cuda-command-line-tools-${CUDA/./-} \
libcublas-${CUDA/./-} \
cuda-nvrtc-${CUDA/./-} \
libcufft-${CUDA/./-} \
libcurand-${CUDA/./-} \
libcusolver-${CUDA/./-} \
libcusparse-${CUDA/./-} \
curl \
libcudnn8=${CUDNN}+cuda${CUDA} \
libfreetype6-dev \
libhdf5-serial-dev \
libzmq3-dev \
pkg-config \
software-properties-common \
unzip
# Install TensorRT if not building for PowerPC
RUN [[ "${ARCH}" = "ppc64le" ]] || { apt-get update && \
apt-get install -y --no-install-recommends libnvinfer${LIBNVINFER_MAJOR_VERSION}=${LIBNVINFER}+cuda${CUDA} \
libnvinfer-plugin${LIBNVINFER_MAJOR_VERSION}=${LIBNVINFER}+cuda${CUDA} \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*; }
# For CUDA profiling, TensorFlow requires CUPTI.
ENV LD_LIBRARY_PATH /usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda/lib64:$LD_LIBRARY_PATH
# Link the libcuda stub to the location where tensorflow is searching for it and reconfigure
# dynamic linker run-time bindings
RUN ln -s /usr/local/cuda/lib64/stubs/libcuda.so /usr/local/cuda/lib64/stubs/libcuda.so.1 \
&& echo "/usr/local/cuda/lib64/stubs" > /etc/ld.so.conf.d/z-cuda-stubs.conf \
&& ldconfig
RUN apt-get update && apt-get install -y --no-install-recommends \
bzip2 ca-certificates libglib2.0-0 libxext6 libsm6 libxrender1 \
libsox-fmt-all sox libsox-dev htop python3 \
tmux zsh vim wget git git-lfs \
nano google-perftools unzip \
cmake libncurses5-dev libncursesw5-dev python3-dev
# See http://bugs.python.org/issue19846
ENV LANG C.UTF-8
SHELL ["/usr/bin/zsh", "-c"]
# install nvtop
RUN ln -s /usr/local/cuda-11.0/targets/x86_64-linux/lib/stubs/libnvidia-ml.so /usr/local/lib/libnvidia-ml.so && \
ln -s /usr/local/cuda-11.0/targets/x86_64-linux/lib/stubs/libnvidia-ml.so /usr/local/lib/libnvidia-ml.so.1 && \
cd /tmp && \
git clone https://github.com/Syllo/nvtop.git && \
mkdir -p nvtop/build && cd nvtop/build && \
cmake .. && \
make && \
make install && \
cd / && \
rm -r /tmp/nvtop && \
rm /usr/local/lib/libnvidia-ml.so && \
rm /usr/local/lib/libnvidia-ml.so.1
@lminer As I already mentioned, nvidia/cuda:*-base-*
images don't have stubs. You'll have to build with nvidia/cuda:*-devel-*
or manually add stubs to the base image.
Wow you're right. Sorry about that. Thanks for being so patient with me.
Now that this repository contains a pre-made dockerfile, this should probably be closed.
Hi guys, thanks for awesome tool. Could you give an example how to wrap nvtop in docker?
Unfortunately this one:
Results in:
When I try to run with:
Any ideas?