NVIDIA / TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
https://developer.nvidia.com/tensorrt
Apache License 2.0
10.77k stars 2.13k forks source link

TensorRT 8.4.2 #2218

Open sc199505 opened 2 years ago

sc199505 commented 2 years ago

Description

TensorRT 8.4.0.6 is no such problem, but TensorRT 8.4.0.6 has 6: [libLoader.h::DynamicLibrary::50] Error Code 6: Internal Error (Unable to load library: libnvinfer_builder_resource.so.8.4.2)

Environment

TensorRT Version : TensorRT-8.4.2.4 NVIDIA GPU : RTX-5000 NVIDIA Driver Version: 460.73.01 CUDA Version: 11.1 CUDNN Version: 8.2.1 Operating System**: 18.04.1-Ubuntu

zerollzeng commented 2 years ago

can you check your LD_LIBRARY_PATH? this file should be under /usr/lib/x86_64-linux-gnu/ or somewhere.

sc199505 commented 2 years ago
zerollzeng commented 2 years ago

export LD_LIBRARY_PATH=/usr/local/TensorRT-8.4.2.4/lib:$LD_LIBRARY_PATH and try again?

sc199505 commented 2 years ago

so can be found, but can not load

zerollzeng commented 2 years ago

I would guess you have more than 1 tensorrt version installed on your system. Can you try export LD_LIBRARY_PATH=/usr/local/TensorRT-8.4.2.4/lib and try again?

sc199505 commented 2 years ago

ok, i try it

WelY1 commented 2 years ago

I would guess you have more than 1 tensorrt version installed on your system. Can you try export LD_LIBRARY_PATH=/usr/local/TensorRT-8.4.2.4/lib and try again?

Hi, I downloaded tensorrt by python3 -m pip install --upgrade nvidia-tensorrt, where can i find my ### LD_LIBRARY_PATH? I can't find it in ### /usr/lib/x86_64-linux-gnu. Thanks!

jakepoz commented 2 years ago

Hey, I ran into the same error today, the issue is that exporting your LD_LIBRARY_PATH can work, but on Ubuntu, you should really make a file such as /etc/ld.so.conf.d/tensorrt.conf with that path of your installed location. However, due to some bug, that doesn't work. There is likely an error somewhere in how TensorRT is searching for its dependent shared objects.

Anas-liu commented 2 years ago

Hi, I get the same problem, but just remove sudo(./yolov5 -s yolov5s.wts yolov5s.engine s), that works! why?

dev0x13 commented 2 years ago

For the concerned ones: apparently libnvinfer uses dlopen call to load libnvinfer_builder_resource library. However, libnvinfer library does not have its rpath attribute set, so dlopen only looks for library in system folders even though libnvinfer_builder_resource is located next to the libnvinfer in the same folder. In order to make things work without setting LD_LIBRARY_PATH, one can properly set libnvinfer's rpath to $ORIGIN. @zerollzeng Perhaps this should be done by default in TRT distribution?

zerollzeng commented 2 years ago

@kevinch-nv for viz

mkaivs commented 2 years ago

Is there any update on this? I have the same error and was forced to use LD_LIBRARY_PATH instead of using ldconfig. Setting LD_LIBRARY_PATH is fine in development but is considered bad practice in production.

Bidski commented 1 year ago

I am also experiencing this issue with version 8.5.1

[E] [TRT] 6: [libLoader.h::DynamicLibrary::54] Error Code 6: Internal Error (Unable to load library: libnvinfer_builder_resource.so.8.5.1)

All of the TensorRT libraries are installed in /usr/local/lib and I have config file for ldconfig setup to look in /usr/local/lib. However, TensorRT will only work if I set LD_LIBRARY_PATH=/usr/local/lib which, as @mkaivs points out, is really poor practice for a production environment.

Bidski commented 1 year ago

Setting the RPath of libnvinfer.so.8.5.1 to the install location of libnvinfer.so.8.5.1 seems to be a good workaround. For example, if libnvinfer.so.8.5.1 is located in /usr/local/lib then

patchelf --set-rpath "/usr/local/lib" "/usr/local/lib/libnvinfer.so.8.5.1"
ShuaiShao93 commented 1 year ago

For some reason, setting LD_LIBRARY_PATH doesn't work for me.

I also had a minimal repro to confirm that "libnvinfer_builder_resource.so.8.5.1" can't be opened with dlopen.

  std::string r = PATH_TO_TENSORRT
  setenv("LD_LIBRARY_PATH", r.c_str(), 1);

  std::string filename = "libnvinfer.so.8";
  CHECK(std::filesystem::exists(r+"/"+filename));
  void * handle = dlopen(filename.c_str(), RTLD_NOW | RTLD_LOCAL);
  # Pass
  CHECK(handle);

  filename = "libnvinfer_builder_resource.so.8.5.1";
  CHECK(std::filesystem::exists(r+"/"+filename));
  handle = dlopen(filename.c_str(), RTLD_NOW | RTLD_LOCAL);
  # Failed
  CHECK(handle) << dlerror();

This failed at the last line

F20221212 16:06:45.507669 2931454 test.cc:24] Check failed: handle libnvinfer_builder_resource.so.8.5.1: cannot open shared object file: No such file or directory

ls -l shows the files are there

lrwxrwxrwx 1 shshao shshao         21 Dec 12 15:20 libnvcaffe_parser.so -> libnvparsers.so.8.5.1
lrwxrwxrwx 1 shshao shshao         21 Dec 12 15:20 libnvcaffe_parser.so.8 -> libnvparsers.so.8.5.1
lrwxrwxrwx 1 shshao shshao         21 Dec 12 15:20 libnvcaffe_parser.so.8.5.1 -> libnvparsers.so.8.5.1
-rwxr-xr-x 1 shshao shshao  373747000 Oct 27 15:38 libnvinfer_builder_resource.so.8.5.1
lrwxrwxrwx 1 shshao shshao         26 Dec 12 15:20 libnvinfer_plugin.so -> libnvinfer_plugin.so.8.5.1
lrwxrwxrwx 1 shshao shshao         26 Dec 12 15:20 libnvinfer_plugin.so.8 -> libnvinfer_plugin.so.8.5.1
-rwxr-xr-x 1 shshao shshao   43399840 Oct 27 15:38 libnvinfer_plugin.so.8.5.1
lrwxrwxrwx 1 shshao shshao         19 Dec 12 15:20 libnvinfer.so -> libnvinfer.so.8.5.1
lrwxrwxrwx 1 shshao shshao         19 Dec 12 15:20 libnvinfer.so.8 -> libnvinfer.so.8.5.1
-rwxr-xr-x 1 shshao shshao  487512744 Oct 27 15:37 libnvinfer.so.8.5.1
lrwxrwxrwx 1 shshao shshao         20 Dec 12 15:20 libnvonnxparser.so -> libnvonnxparser.so.8
lrwxrwxrwx 1 shshao shshao         24 Dec 12 15:20 libnvonnxparser.so.8 -> libnvonnxparser.so.8.5.1
-rwxr-xr-x 1 shshao shshao    2838832 Oct 27 15:35 libnvonnxparser.so.8.5.1
lrwxrwxrwx 1 shshao shshao         21 Dec 12 15:20 libnvparsers.so -> libnvparsers.so.8.5.1
lrwxrwxrwx 1 shshao shshao         21 Dec 12 15:20 libnvparsers.so.8 -> libnvparsers.so.8.5.1
-rwxr-xr-x 1 shshao shshao    3424720 Oct 27 15:38 libnvparsers.so.8.5.1
gcp commented 1 year ago

In order to make things work without setting LD_LIBRARY_PATH, one can properly set libnvinfer's rpath to $ORIGIN.

@dev0x13 and @Bidski, thanks for this suggestion. I was afraid it would not work because we can't patch in the origin processing flag (-Wl,-z,origin) but in practice this fixes the problem nevertheless.

I agree this is a serious regression for deploying TensorRT stuff in production, and it also affects the TensorRT 8.5 releases.

hch-baobei commented 1 year ago

I had the same problem: Error Code 6: Internal Error (Unable to load library: libnvinfer_builder_resource.so.8.4.3)

But after I copied this file to /usr/lib/x86_64-linux-gnu, the problem was solved. I don't know why, please ask for an explanation.I searched the container for this file and found that it was only available in /usr/local/src/TensorRT-8.4.3.1/targets/x86_64-linux-gnu and here, so there should be no environment conflict issues, which makes me even weirder.

gcp commented 1 year ago

Throwing libraries in systems dirs needs root permissions for one, so this doesn't really "solve" anything.

hzwhl commented 1 year ago

I also encountered this problem. Just turn off pycharm and then rerun program

Arunass commented 1 year ago

This is still a problem in 8.5.3.

The release notes indicate that the runpath is no longer used as of TensortRT 8.4.1 (https://docs.nvidia.com/deeplearning/tensorrt/release-notes/index.html#rel-8-4-0-EA),

The TensorRT shared library files no longer have RUNPATH set to $ORIGIN. This setting was causing unintended behavior for some users. If you relied on this setting before you may have trouble with missing library dependencies when loading TensorRT. It is preferred that you manage your own library search path using LD_LIBRARY_PATH or a similar method.

So setting the LD_LIBRARY_PATH, adding to /etc/ld.so.conf, or adding a suitable file in /etc/ld.so.conf.d are apparently the intended approaches. We have added a path to ld.so.conf.d that since we don't install CUDNN or Libnvinfer in the 'default' location. Our application now runs fine until it tries to dynamically load the builder resource library. Adding a link to the installed library from /usr/lib/x86_64-linux-gnu 'fixes' it.

I continue to troubleshoot, despite our application working because the library can now be found. Even after running ldconfig to rebuild the cache, the library is not in the cache:

root§HSM1:# ldconfig -p | grep build
    do_not_link_against_nvinfer_builder_resource (libc6,x86-64) => /opt/PrivateLibs/lib/do_not_link_against_nvinfer_builder_resource
    do_not_link_against_nvinfer_builder_resource (libc6,x86-64) => /lib/x86_64-linux-gnu/do_not_link_against_nvinfer_builder_resource

where funnily do_not_link_against_nvinfer_builder_reseource is a link to libnvinfer_builder_resource.so.8.5.3, which is a link to /opt/PrivateLibs/lib/libnvinfer_builder_resource.so.8.5.3

I suspect now that there's something special with the builder_resource library - especially since there's this weird do not link against link floating around. Indeed, this explains where these strange links come from:

root§HSM1:/usr/lib/x86_64-linux-gnu# readelf -a libnvinfer_builder_resource.so.8.5.3 | grep builder
 0x000000000000000e (SONAME)             Library soname: [do_not_link_against_nvinfer_builder_resource]
  000000: Rev: 1  Flags: BASE  Index: 1  Cnt: 1  Name: do_not_link_against_nvinfer_builder_resource

I'll go out on a limb and suggest that this is what is messing things up. ldconfig is making these links and adding them to its cache, but doesn't realize that this is not the name of the library:

root§HSM1:/usr/lib/x86_64-linux-gnu# rm do_not_link_against_nvinfer_builder_resource 
root§HSM1:/usr/lib/x86_64-linux-gnu# ldconfig
root§HSM1:/usr/lib/x86_64-linux-gnu# ls *build*
do_not_link_against_nvinfer_builder_resource  libnvinfer_builder_resource.so.8.5.3

so this odd hack breaks the normal search methods of dlopen(), and the library cannot be opened unless it's in one of the default lib paths

gcp commented 1 year ago

So setting the LD_LIBRARY_PATH, adding to /etc/ld.so.conf, or adding a suitable file in /etc/ld.so.conf.d are apparently the intended approaches.

Thanks for finding the release note. Ugh, so NVIDIA broke this intentionally and you're supposed to use a wrapper or bootstrap executable before launching the real TensorRT application.

Again, adding to /etc is not a reasonable suggestion because it requires root permissions.

I managed to get this working by hacking the libraries to fix back the $ORIGIN/rpath, but seriously the suggested solutions are not reasonable for deployment and this is a major regression.

chenxinfeng4 commented 8 months ago

Setting the RPath of libnvinfer.so.8.5.1 to the install location of libnvinfer.so.8.5.1 seems to be a good workaround. For example, if libnvinfer.so.8.5.1 is located in /usr/local/lib then

patchelf --set-rpath "/usr/local/lib" "/usr/local/lib/libnvinfer.so.8.5.1"

Really works for me. Thanks.

sherlockchou86 commented 6 months ago

Hi, I get the same problem, but just remove sudo(./yolov5 -s yolov5s.wts yolov5s.engine s), that works! why?

works for me, just remove sudo ahead of command.

cx-333 commented 3 months ago

Hi, I get the same problem, but just remove sudo(./yolov5 -s yolov5s.wts yolov5s.engine s), that works! why?

It works ! Good!