OE4T / tegra-demo-distro

Reference/demonstration distro for meta-tegra
MIT License
73 stars 74 forks source link

cmake fails: version `GLIBC_2.29' not found for libm.so.6 #257

Closed ervgan closed 1 year ago

ervgan commented 1 year ago

Hello again guys,

I stumbled another issue that I could not resolve when trying to use cmake within a Nvidia container on top of Yocto. I get the messages: cmake: /lib/aarch64-linux-gnu/libm.so.6: version GLIBC_2.29' not found (required by /usr/lib/libxml2.so.2) cmake: /lib/aarch64-linux-gnu/libm.so.6: version GLIBC_2.29' not found (required by /usr/lib/libxml2.so.2) cmake: /lib/aarch64-linux-gnu/libc.so.6: version GLIBC_2.28' not found (required by /usr/lib/libp11-kit.so.0) and when I run strings /lib/libc.so.6 |grep GLIBC_ in Yocto, I get:

GLIBC_2.17 GLIBC_2.18 GLIBC_2.22 GLIBC_2.23 GLIBC_2.24 GLIBC_2.25 GLIBC_2.26 GLIBC_2.27 GLIBC_2.28 GLIBC_2.29 GLIBC_2.30 GLIBC_PRIVATE but running strings /lib/aarch64-linux-gnu/libc.so.6 |grep GLIBC_ in the Nvidia container, I get:

GLIBC_2.17 GLIBC_2.18 GLIBC_2.22 GLIBC_2.23 GLIBC_2.24 GLIBC_2.25 GLIBC_2.26 GLIBC_2.27 GLIBC_PRIVATE

Does that mean that the version of libm.so.6 in Yocto is more recent than the one in the Nvidia container? and if so why is it case? I am still on dunfell branch at R32.7.2 and my image is the TensoRT 8.2.1 which is the one that comes with JetPack 4.6.2 so I shouldn't have a compatibility mismatch, right?

Thanks again

madisongh commented 1 year ago

Where is your cmake binary coming from? I'd expect to see something like this if you are trying to export it (and its dependencies, like libxml2 and libp11-kit mentioned there) from the host OS into the container. You really cannot do this, for the very reason you mention - glibc is provided by the container image, and is a much older version that the one in your host OS.

ervgan commented 1 year ago

I see, this is actually what I am doing. I mounted the cmake binary and other librraies from the Yocto base image into the Nvidia container, the reason being that there seems to be quite a few missing libraries in the Nvidia container for me to use cmake to build my tensorRT code. I wasn't sure how to bring all the missing libraries (I am also using glog for example) into the container without having to use a custom version of a Nvidia DockerFile (as suggested by Dan), especially since the DockerFile for the Nvidia tensorRT image is not avaivable on gitlab.

Do you have any suggestions? or the only proper way is to create my own image based from the Nvidia image and just add cmake, glog, etc and everything else that I need at the Nvidia container level?

dwalkes commented 1 year ago

or the only proper way is to create my own image based from the Nvidia image and just add cmake, glog, etc and everything else that I need at the Nvidia container level?

This is typically what you'd do, yes.

especially since the DockerFile for the Nvidia tensorRT image is not avaivable on gitlab.

You shouldn't absolutely need this, you can instead start with their image and add to it as needed. My guess is it's probably related to the dockerfiles in https://github.com/NVIDIA/TensorRT/tree/release/8.6/docker though and they are just missing a link on the NGC site. You could ask for this on the NVIDIA developer forum.

ervgan commented 1 year ago

Thanks again! I will try working on that.

ervgan commented 1 year ago

I managed to cmake my code by installing cmake, build-essential and glog at the container level. However, to make my program executable, I need some header files from TensorRT such as NvInfer.h which are not available in the Nvidia L4T containers, for now Nvidia only has runtime containers for L4T, so lots of header files missing.

I'm sure someone else has ran into this issue but I could not find a clear solution to this. I see some people recommending to delete the tensorrt.csv file from host (in this case it would be the tensorrt-core.csv) so it does not get passed to the container, then manually installing the whole tensorrt library at the container level (not just the runtime).

What do you guys think?

dwalkes commented 1 year ago

Based on https://github.com/dusty-nv/jetson-inference/issues/281#issuecomment-938728673 I'd try installing libnvinfer-dev inside the container.

ervgan commented 1 year ago

Thanks Dan! I will try this out.

ervgan commented 1 year ago

I tried lots of things but still not luck compiling my tensorrt code with NvInfer.h. Basically the libnvinfer-dev package is not available for base images corresponding to L4T 32.7.1 I think because the install works using a cuda 11.4 image (although not compatible with my yocto base image which is on cuda 10.2) How did you guys compile the TensorRT samples present in the yocto base image? I see that they also rely on NvInfer.h.

My second issue is that using the L4T-32.7.1 image, I get an error telling me that the Cuda compiler is broken and cuda_runtime.h is not found but in the base image, I see them located here:

/data/docker/overlay2//diff/usr/local/cuda-10.2/targets/aarch64-linux/include/cuda_runtime.h /data/docker/overlay2//diff/usr/local/cuda-10.2/targets/aarch64-linux/include/cuda_runtime.h

Why are they not in the include folder under /usr/local/cuda-10.2/include ? under /etc/nvidia-container-runtime/host-files-for-container.d/, the cuda-nvcc.csv shows "dir, /usr/local/cuda-10.2" but obviously that folder is missing the header file in include. For this issue, should I replace the cuda.bbclass from the dunfell branch that I am using by the one in master? I saw that Matt made some updates on this file regarding a similar issue.

Thanks!

dwalkes commented 1 year ago

How did you guys compile the TensorRT samples present in the yocto base image? I see that they also rely on NvInfer.h.

I think the relevant recipe is at https://github.com/OE4T/meta-tegra/blob/dunfell/recipes-devtools/gie/tensorrt-samples.inc

I'm having a little trouble understanding exactly what you are trying to do. Could you share an environment I could use to reproduce? Perhaps a Dockerfile I could use to reproduce what you are seeing on the dunfell branch with demo-image-full or similar?

I think the thing to demonstrate would be that a Dockerfile build of whatever you are attempting to build succeeds when you run on Stock Jetpack but fails on meta-tegra.

ervgan commented 1 year ago

I basically have a deep learning TensorRT code in c++ that I want to run in a Nvidia container on top of a Yocto base image. To compile my code, I imported glog, build-essential and cmake at the container level using the following Dockerfile: `FROM nvcr.io/nvidia/l4t-tensorrt:r8.2.1-runtime

RUN apt-get update && apt-get install -y \ libgoogle-glog-dev \ build-essential

RUN cd /tmp && \ wget --no-check-certificate https://github.com/Kitware/CMake/releases/download/v3.21.4/cmake-3.21.4-linux-aarch64.sh && \ chmod +x cmake-3.21.4-linux-aarch64.sh && \ ./cmake-3.21.4-linux-aarch64.sh --prefix=/usr/local --exclude-subdir --skip-license && \ rm ./cmake-3.21.4-linux-aarch64.sh

ENV PATH ="/usr/local/bin:${PATH}" ` Here I start from the L4T tensorrt image 8.2.1. but I also tried the L4T cuda 10.2 image which gives the same NvInfer.h missing, and when using the L4T-base image, I get as error message "cuda_runtime.h" missing.

When I try to install libnvinfer-dev in this Dockerfile, I get the message "cannot locate libnvinfer-dev package" but if I try with a L4T cuda 11.4 image, I can indeed install libnvinfer-dev.

So basically, I am stuck on compiling my c++ code at this point because it relies on NvInfer.h header which is part of the dev package and not the runtime package. That's why I was curious as to how you guys compiled the samples to get the executable binaries under samples/bin.

I will try running my Dockerfile on Jetpack now.

Do you know what I'm doing wrong? I am sure you guys have some experience compiling tensorRT code in Nvidia containers on top of Yocto.

ervgan commented 1 year ago

What I did now is basically compile my code on Jetpack 4.6.2 since I cannot compile at container level and then I managed to run the code in the container nvcr.io/nvidia/l4t-tensorrt:r8.2.1-runtime. But when I take that compiled binary and try to run it in yocto, the file is non executable, do you know why?

dwalkes commented 1 year ago

But when I take that compiled binary and try to run it in yocto, the file is non executable, do you know why?

For the same reasons discussed in https://github.com/OE4T/tegra-demo-distro/issues/257#issuecomment-1497708668 I believe - the container runtime dependencies and stock Jetpack runtime dependencies are likely to match, but will not match the Yocto runtime glibc and friends.

Were you able to get a docker environment and a set of code (tensorrt examples?) which build on stock jetpack but not on meta-tegra? If so, share the full dockerfile and I'll take a look. The one you've got above doesn't reference any examples.

ervgan commented 1 year ago

Actually, I did not manage to build my code in a container on stock Jetpack either but I managed to compile it directly on Jetpack and Im using that executable to run in a l4t-base container on top of Yocto and its working. Thanks a lot for all your help Matt and Dan :)