NVIDIA / TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
https://developer.nvidia.com/tensorrt
Apache License 2.0
10.77k stars 2.13k forks source link

Engine building failure of TensorRT 10.2.0 (pip install) when building a custom diffusion model on RTX 4090 #3983

Open ifeherva opened 4 months ago

ifeherva commented 4 months ago

Description

Fresh install of pip install tensorrt==10.2.0

Following engine build crashes on Ubuntu 22.04.4 LTS:

from polygraphy.backend.trt import EngineFromNetwork

EngineFromNetwork(
            network,
            config=CreateConfig(fp16=fp16,
                tf32=tf32,
                int8=int8,
                refittable=enable_refit,
                profiles=[p],
                load_timing_cache=timing_cache,
                builder_optimization_level=3,
                **extra_build_args
            ),
            save_timing_cache=timing_cache
        )()

Error message:

IBuilder::buildSerializedNetwork: Error Code 6: API Usage Error (Unable to load library: libnvinfer_builder_resource_win.so.10.2.0: libnvinfer_builder_resource_win.so.10.2.0: cannot open shared object file: No such file or directory)

Build works fine on 10.1.0 and 10.0.0

Environment

TensorRT Version: 10.2.0

NVIDIA GPU: RTX 4090

NVIDIA Driver Version: 550

CUDA Version: 12.1.r12.1

CUDNN Version: 8.9.7

Operating System: Ubuntu 22.04.4 LTS

Python Version (if applicable):

PyTorch Version (if applicable): 2.3.1

Baremetal or Container (if so, version):

Relevant Files

Model link:

Steps To Reproduce

Commands or scripts:

Have you tried the latest release?:

This is the latest release.

Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt):

Yes, the above command completes successfully, the ONNX file is correct.

lautaropaske commented 4 months ago

+1. Same issue: tensorrt fails due to non-existent windows library in a linux distro (libnvinfer_builder_resource_win.so.10.2.0: cannot open shared object file: No such file or directory)

lanyuer commented 4 months ago

how to fix it?

thefoxfarmer commented 4 months ago

Almost exactly the same setup here, same problem. Using it via ComfyUI.

TensorRT Version: 10.2.0 NVIDIA GPU: RTX 4090 CUDA Version: 12.1.105 CUDNN Version: 8.9.2.26 Operating System: Ubuntu 22.04.3 Python Version (if applicable): 3.10 PyTorch Version (if applicable): 2.3.1+cu121

I did a little bit of research on this and determined that the non-Windows library (libnvinfer_builder_resource.so.10.2.0) was already opened by the process, so it's a real mystery to me why it was trying to open the Windows version. The dlopen (or whatever) is happening inside the tensorrt.so compiled code, not anything to do with the Python wrapper around it, so it's hard to debug farther.

I made a symlink from the proper DSO to the Windows filename, but that fixed nothing: The symbols that it then looks for inside are also suffixed with _win.

I asked in the discussion forum for the Comfy nodes... https://github.com/comfyanonymous/ComfyUI_TensorRT/discussions/49 But clearly they have nothing to do with it.

lanyuer commented 4 months ago

I also have this problem on Windows WSL2.

thefoxfarmer commented 4 months ago

10.1.0 is also working for me on the setup outlined above where 10.2.0 did not.

BuffMcBigHuge commented 4 months ago

Downgrading by running this command fixed the issue for me.

pip install tensorrt==10.1.0 tensorrt-cu12==10.1.0 tensorrt-cu12-bindings==10.1.0 tensorrt-cu12-libs==10.1.0 --force-reinstall
online2311 commented 4 months ago

通过运行此命令降级为我解决了这个问题。

pip install tensorrt==10.1.0 tensorrt-cu12==10.1.0 tensorrt-cu12-bindings==10.1.0 tensorrt-cu12-libs==10.1.0 --force-reinstall

Solved my problem, thanks.

glenn-jocher commented 4 months ago

Resolved in Ultralytics package by pinning tensorrt<=10.2.0, but does not resolve underlying issue unfortunately. https://github.com/ultralytics/ultralytics/pull/14239

RONNYKHALIL commented 4 months ago

通过运行此命令降级为我解决了这个问题。

pip install tensorrt==10.1.0 tensorrt-cu12==10.1.0 tensorrt-cu12-bindings==10.1.0 tensorrt-cu12-libs==10.1.0 --force-reinstall

Solved my problem, thanks.

same!!! thank uuuu

lix19937 commented 4 months ago

libnvinfer_builder_resource_win.so.10.2.0: libnvinfer_builder_resource_win.so.10.2.0: cannot open shared object file: No such file or directory)

can you find libnvinfer_builder_resource_win.so.10.2.0 ?

The tensorrt Python wheel files only support Python versions 3.8 to 3.12 at this time and will not work with other Python versions. Only the Linux and Windows operating systems and the x86_64 CPU architecture are currently supported. These Python wheel files are expected to work on RHEL 8 or newer, Ubuntu 20.04 or newer, and Windows 10 or newer.

ifeherva commented 4 months ago

libnvinfer_builder_resource_win.so.10.2.0: libnvinfer_builder_resource_win.so.10.2.0: cannot open shared object file: No such file or directory)

can you find libnvinfer_builder_resource_win.so.10.2.0 ?

The tensorrt Python wheel files only support Python versions 3.8 to 3.12 at this time and will not work with other Python versions. Only the Linux and Windows operating systems and the x86_64 CPU architecture are currently supported. These Python wheel files are expected to work on RHEL 8 or newer, Ubuntu 20.04 or newer, and Windows 10 or newer.

No, those _win files dont exist on ubuntu.

zolero commented 4 months ago

For me installing tensorrt_llm==0.12.0.dev2024070200 works!

yorickvP commented 3 months ago

Upgrading to tensorrt==0.2.0.post1 fixes the problem.