Closed iripatx closed 3 years ago
So after more searching I found out this is a duplicate of https://github.com/ceccocats/tkDNN/issues/127.
I'm closing the issue myself. Sorry for the confusion.
I'll document my solution just in case anyone searches for the same problem.
I ended up using the NVIDIA L4T ML docker image. I extended it a bit to add some tools (make, cmake, yaml...) and ran it using the Jetpack's Container Runtime. You can also install ROS if you need.
I'm aware that this image contains many tools that are not needed. I picked it to do some quick tests,though it would be better to take the L4T base image and extend it.
I'm aware that this image contains many tools that are not needed. I picked it to do some quick tests,though it would be better to take the L4T base image and extend it.
I do use l4t-base , dockerfile looks like this
ARG BUILD_IMAGE=nvcr.io/nvidia/l4t-base:r32.4.4
ARG BASE_IMAGE=${BUILD_IMAGE}
FROM ${BUILD_IMAGE} as builder
RUN apt-get update \
&& export DEBIAN_FRONTEND=noninteractive \
&& apt-get -y install --no-install-recommends \
build-essential cmake git ninja-build \
libgtk-3-dev python3-dev python3-numpy \
ca-certificates file \
libeigen3-dev libyaml-cpp-dev libssl-dev \
#
# Clean up
&& apt-get autoremove -y \
&& apt-get clean -y \
&& rm -rf /var/lib/apt/lists/*
# CMAKE
WORKDIR /usr/local/src
ARG CTAG=v3.18.4
RUN git clone --depth 1 --branch ${CTAG} https://github.com/Kitware/CMake.git \
&& mkdir cmake_build
WORKDIR /usr/local/src/cmake_build
RUN cmake \
-G Ninja \
/usr/local/src/CMake
RUN ninja -j$(nproc) \
&& ninja install -j$(nproc)
# OPENCV
# https://docs.opencv.org/master/d2/de6/tutorial_py_setup_in_ubuntu.html
WORKDIR /usr/local/src
ARG CVTAG=4.5.0
RUN git clone --depth 1 --branch ${CVTAG} https://github.com/opencv/opencv.git \
&& git clone --depth 1 --branch ${CVTAG} https://github.com/opencv/opencv_contrib.git \
&& mkdir opencv_build
WORKDIR /usr/local/src/opencv_build
RUN cmake \
-G Ninja \
-D WITH_CUDA=ON \
-D CUDA_ARCH_BIN='5.3 7.2' \
-D CUDA_FAST_MATH=ON \
-D OPENCV_DNN_CUDA=ON \
-D OPENCV_EXTRA_MODULES_PATH=/usr/local/src/opencv_contrib/modules \
/usr/local/src/opencv
RUN ninja -j$(nproc) \
&& ninja install -j$(nproc) \
&& ninja package -j$(nproc)
# TKDNN
WORKDIR /usr/local/src
ARG TTAG=master
RUN git clone --depth 1 --branch ${TTAG} https://github.com/ceccocats/tkDNN.git \
&& mkdir tkdnn_build
WORKDIR /usr/local/src/tkdnn_build
RUN cmake \
-G Ninja \
-D CMAKE_INSTALL_PREFIX=/usr/local/tkdnn \
/usr/local/src/tkdnn
RUN ninja -j$(nproc) \
&& ninja install -j$(nproc)
# FINAL IMAGE
FROM ${BASE_IMAGE}
RUN apt-get update && export DEBIAN_FRONTEND=noninteractive \
&& apt-get -y install --no-install-recommends \
libyaml-cpp0.5v5 python3-numpy \
#
# Clean up
&& apt-get autoremove -y \
&& apt-get clean -y \
&& rm -rf /var/lib/apt/lists/*
# install opencv
COPY --from=builder /usr/local/src/opencv_build/OpenCV-*-aarch64.sh /tmp/
RUN /tmp/OpenCV-*-aarch64.sh --skip-license --prefix=/usr/local \
&& rm /tmp/OpenCV-*-aarch64.sh
# install tkdnn
# COPY --from=builder /usr/local/tkdnn /usr/local/tkdnn
# RUN echo "/usr/local/tkdnn/lib" > /etc/ld.so.conf.d/tkdnn.conf \
# && ldconfig
# ENV PATH=$PATH:/usr/local/tkdnn/bin
COPY --from=builder /usr/local/tkdnn/bin /usr/local/bin
COPY --from=builder /usr/local/tkdnn/lib /usr/local/lib
Thank you very much for sharing! :)
How is this container going to compile if it doesn't have cudnn? What am I not seeing here?
I'm aware that this image contains many tools that are not needed. I picked it to do some quick tests,though it would be better to take the L4T base image and extend it.
I do use l4t-base , dockerfile looks like this
ARG BUILD_IMAGE=nvcr.io/nvidia/l4t-base:r32.4.4 ARG BASE_IMAGE=${BUILD_IMAGE} FROM ${BUILD_IMAGE} as builder RUN apt-get update \ && export DEBIAN_FRONTEND=noninteractive \ && apt-get -y install --no-install-recommends \ build-essential cmake git ninja-build \ libgtk-3-dev python3-dev python3-numpy \ ca-certificates file \ libeigen3-dev libyaml-cpp-dev libssl-dev \ # # Clean up && apt-get autoremove -y \ && apt-get clean -y \ && rm -rf /var/lib/apt/lists/* # CMAKE WORKDIR /usr/local/src ARG CTAG=v3.18.4 RUN git clone --depth 1 --branch ${CTAG} https://github.com/Kitware/CMake.git \ && mkdir cmake_build WORKDIR /usr/local/src/cmake_build RUN cmake \ -G Ninja \ /usr/local/src/CMake RUN ninja -j$(nproc) \ && ninja install -j$(nproc) # OPENCV # https://docs.opencv.org/master/d2/de6/tutorial_py_setup_in_ubuntu.html WORKDIR /usr/local/src ARG CVTAG=4.5.0 RUN git clone --depth 1 --branch ${CVTAG} https://github.com/opencv/opencv.git \ && git clone --depth 1 --branch ${CVTAG} https://github.com/opencv/opencv_contrib.git \ && mkdir opencv_build WORKDIR /usr/local/src/opencv_build RUN cmake \ -G Ninja \ -D WITH_CUDA=ON \ -D CUDA_ARCH_BIN='5.3 7.2' \ -D CUDA_FAST_MATH=ON \ -D OPENCV_DNN_CUDA=ON \ -D OPENCV_EXTRA_MODULES_PATH=/usr/local/src/opencv_contrib/modules \ /usr/local/src/opencv RUN ninja -j$(nproc) \ && ninja install -j$(nproc) \ && ninja package -j$(nproc) # TKDNN WORKDIR /usr/local/src ARG TTAG=master RUN git clone --depth 1 --branch ${TTAG} https://github.com/ceccocats/tkDNN.git \ && mkdir tkdnn_build WORKDIR /usr/local/src/tkdnn_build RUN cmake \ -G Ninja \ -D CMAKE_INSTALL_PREFIX=/usr/local/tkdnn \ /usr/local/src/tkdnn RUN ninja -j$(nproc) \ && ninja install -j$(nproc) # FINAL IMAGE FROM ${BASE_IMAGE} RUN apt-get update && export DEBIAN_FRONTEND=noninteractive \ && apt-get -y install --no-install-recommends \ libyaml-cpp0.5v5 python3-numpy \ # # Clean up && apt-get autoremove -y \ && apt-get clean -y \ && rm -rf /var/lib/apt/lists/* # install opencv COPY --from=builder /usr/local/src/opencv_build/OpenCV-*-aarch64.sh /tmp/ RUN /tmp/OpenCV-*-aarch64.sh --skip-license --prefix=/usr/local \ && rm /tmp/OpenCV-*-aarch64.sh # install tkdnn # COPY --from=builder /usr/local/tkdnn /usr/local/tkdnn # RUN echo "/usr/local/tkdnn/lib" > /etc/ld.so.conf.d/tkdnn.conf \ # && ldconfig # ENV PATH=$PATH:/usr/local/tkdnn/bin COPY --from=builder /usr/local/tkdnn/bin /usr/local/bin COPY --from=builder /usr/local/tkdnn/lib /usr/local/lib
How is this container going to compile if it doesn't have cudnn? What am I not seeing here?
I'm aware that this image contains many tools that are not needed. I picked it to do some quick tests,though it would be better to take the L4T base image and extend it.
I do use l4t-base , dockerfile looks like this
ARG BUILD_IMAGE=nvcr.io/nvidia/l4t-base:r32.4.4 ARG BASE_IMAGE=${BUILD_IMAGE} FROM ${BUILD_IMAGE} as builder RUN apt-get update \ && export DEBIAN_FRONTEND=noninteractive \ && apt-get -y install --no-install-recommends \ build-essential cmake git ninja-build \ libgtk-3-dev python3-dev python3-numpy \ ca-certificates file \ libeigen3-dev libyaml-cpp-dev libssl-dev \ # # Clean up && apt-get autoremove -y \ && apt-get clean -y \ && rm -rf /var/lib/apt/lists/* # CMAKE WORKDIR /usr/local/src ARG CTAG=v3.18.4 RUN git clone --depth 1 --branch ${CTAG} https://github.com/Kitware/CMake.git \ && mkdir cmake_build WORKDIR /usr/local/src/cmake_build RUN cmake \ -G Ninja \ /usr/local/src/CMake RUN ninja -j$(nproc) \ && ninja install -j$(nproc) # OPENCV # https://docs.opencv.org/master/d2/de6/tutorial_py_setup_in_ubuntu.html WORKDIR /usr/local/src ARG CVTAG=4.5.0 RUN git clone --depth 1 --branch ${CVTAG} https://github.com/opencv/opencv.git \ && git clone --depth 1 --branch ${CVTAG} https://github.com/opencv/opencv_contrib.git \ && mkdir opencv_build WORKDIR /usr/local/src/opencv_build RUN cmake \ -G Ninja \ -D WITH_CUDA=ON \ -D CUDA_ARCH_BIN='5.3 7.2' \ -D CUDA_FAST_MATH=ON \ -D OPENCV_DNN_CUDA=ON \ -D OPENCV_EXTRA_MODULES_PATH=/usr/local/src/opencv_contrib/modules \ /usr/local/src/opencv RUN ninja -j$(nproc) \ && ninja install -j$(nproc) \ && ninja package -j$(nproc) # TKDNN WORKDIR /usr/local/src ARG TTAG=master RUN git clone --depth 1 --branch ${TTAG} https://github.com/ceccocats/tkDNN.git \ && mkdir tkdnn_build WORKDIR /usr/local/src/tkdnn_build RUN cmake \ -G Ninja \ -D CMAKE_INSTALL_PREFIX=/usr/local/tkdnn \ /usr/local/src/tkdnn RUN ninja -j$(nproc) \ && ninja install -j$(nproc) # FINAL IMAGE FROM ${BASE_IMAGE} RUN apt-get update && export DEBIAN_FRONTEND=noninteractive \ && apt-get -y install --no-install-recommends \ libyaml-cpp0.5v5 python3-numpy \ # # Clean up && apt-get autoremove -y \ && apt-get clean -y \ && rm -rf /var/lib/apt/lists/* # install opencv COPY --from=builder /usr/local/src/opencv_build/OpenCV-*-aarch64.sh /tmp/ RUN /tmp/OpenCV-*-aarch64.sh --skip-license --prefix=/usr/local \ && rm /tmp/OpenCV-*-aarch64.sh # install tkdnn # COPY --from=builder /usr/local/tkdnn /usr/local/tkdnn # RUN echo "/usr/local/tkdnn/lib" > /etc/ld.so.conf.d/tkdnn.conf \ # && ldconfig # ENV PATH=$PATH:/usr/local/tkdnn/bin COPY --from=builder /usr/local/tkdnn/bin /usr/local/bin COPY --from=builder /usr/local/tkdnn/lib /usr/local/lib
Well l4t containers are ment to be used on their jetson products , their os contains modified nvidia-container-runtime which mounts Cuda and cudnn libraries from os into container during run . As much as I hate it , their reason is to make container images smaller
But as far as I know, that container only mounts cuda, not cuda cudnn or tensor rt. In fact, I've tested it,and it doesn't detect cudnn when compiling openCV. Maybe I am doing something wrong, like the build command. Does it need to be specified that is nvidia runtime like the run command?
But as far as I know, that container only mounts cuda, not cuda cudnn or tensor rt. In fact, I've tested it,and it doesn't detect cudnn when compiling openCV. Maybe I am doing something wrong, like the build command. Does it need to be specified that is nvidia runtime like the run command?
then let me extend your knowledge ;)
jetpack has following packages :
# - nvidia-container-csv-cuda
# - nvidia-container-csv-cudnn
# - nvidia-container-csv-tensorrt
they depend on nvidia-cuda , nvidia-cudnn8, and nvidia-tensorrt
when installed, following files are deployed :
lzzii@jtsna-2109beta1:~$ ls /etc/nvidia-container-runtime/host-files-for-container.d/
cuda.csv cudnn.csv l4t.csv tensorrt.csv visionworks.csv
which instructs nvidia-container-runtime to mount cuda/cuddn/tensorflow libraries inside l4t-base image.
Your welcome ;)
Thank you very much. Ok, now I start to see how this is going. My jp 4.4 installed all as expected
libnvidia-container-tools/stable,now 0.9.0~beta.1 arm64 [instalado]
libnvidia-container0/stable,now 0.9.0~beta.1 arm64 [instalado]
nvidia-container/stable 4.4.1-b50 arm64
nvidia-container-csv-cuda/stable 10.2.89-1 arm64 [actualizable desde: 10.2.89-1]
nvidia-container-csv-cudnn/stable,now 8.0.0.180-1+cuda10.2 arm64 [instalado]
nvidia-container-csv-tensorrt/stable,now 7.1.3.0-1+cuda10.2 arm64 [instalado]
nvidia-container-csv-visionworks/stable,now 1.6.0.501 arm64 [instalado]
nvidia-container-runtime/stable,now 3.1.0-1 arm64 [instalado]
nvidia-container-toolkit/stable,now 1.0.1-1 arm64 [instalado]
and:
ls /etc/nvidia-container-runtime/host-files-for-container.d/
cuda.csv cudnn.csv l4t.csv tensorrt.csv visionworks.csv
but , your dockerfile works copying-pasting it inside a container run with --runtime nvidia , but it doesn't if I use it a docker build. So I guess I am not pointing to the nvidia runtime when building. Am I right?
Thanks for the dockerfile script! I am still having an issue where it crashes at runtime when calling initing a Yolo3Detector instance. The specific place it crashes is https://github.com/ceccocats/tkDNN/blob/master/src/NetworkRT.cpp#L37 when calling builderRT->platformHasFastFp16()
Any tips for fixing this?
Edit: did more debugging, when I run the tkDNN tests it errors with the message
CUDNN failure: CUDNN_STATUS_NOT_INITIALIZED
I want to say sorry in advance if this is more of a theoretical misunderstanding on my part, since I'm only starting to work with Jetson devices.
We want to run some tests using tkDNN in a docker container inside the jetson. I tried using the pre-built docker image, but I receive the following error:
standard_init_linux.go:211: exec user process caused “exec format error”
After some research, I found out that error might be caused by the image's architecture (amd64) not being the jetson's CPU architecture. I'm wondering if I'm missing something on the configuration, or if the docker images are simply made to test the library on x86 devices.
Thank you for your time.