Open appearancefnp opened 2 weeks ago
Can you upload full log with trtexec --onnx=model.onnx --verbose
?
@lix19937 trtexec.log
[06/17/2024-14:57:28] [E] [TRT] std::exception [06/17/2024-14:57:28] [F] [TRT] Validation failed: Failed to load libcudnn.so.8. plugin/common/cudnnWrapper.cpp:90
Make sure libcudnn.so load successed. Add path to LD_LIBRARY_PATH
.
[06/17/2024-14:57:28] [E] [TRT] std::exception [06/17/2024-14:57:28] [F] [TRT] Validation failed: Failed to load libcudnn.so.8. plugin/common/cudnnWrapper.cpp:90
Make sure libcudnn.so load successed. Add path to
LD_LIBRARY_PATH
.
The problem is that the NVIDIA container contains cudnn 9.1.0, but the plugin is trying to load libcudnn.so.8. There is a version mismatch, not that cudnn is not available.
You should make sure your env has one cudnn, and why your nvinfer plugin will load cudnn.8.0 ?
This is not my plugin - this is the plugin provided in this repo - https://github.com/NVIDIA/TensorRT/tree/release/10.1/plugin/groupNormalizationPlugin
And it loads cudnn 8, not 9 because it has the wrong macro defined here: https://github.com/NVIDIA/TensorRT/blob/release/10.1/plugin/common/cudnnWrapper.cpp#L26
From https://github.com/NVIDIA/TensorRT/tree/release/10.0, trt version 10.0.1.6, cudnn
recommend follow
TensorRT GA build
TensorRT v10.0.1.6 Available from direct download links listed below System Packages
CUDA Recommended versions: cuda-12.2.0 + cuDNN-8.9 cuda-11.8.0 + cuDNN-8.9 GNU make >= v4.1 cmake >= v3.13 python >= v3.8, <= v3.10.x pip >= v19.0 Essential utilities git, pkg-config, wget
map to https://github.com/NVIDIA/TensorRT/blob/release/10.1/plugin/common/cudnnWrapper.cpp#L26-L42
You can try to creat a soft link ln -s libcudnn.so.9 libcudnn.so.8
.
Why does the container include cudnn 9 then? https://docs.nvidia.com/deeplearning/tensorrt/container-release-notes/index.html#rel-24-06
If TensorRT doesn't work in an NVIDIA container with cudnn 9, why does it ship with it?
Description
Hey guys! I wanted to upgrade from TensorRT 8.6 to 10.0. I have a ONNX model that contains GroupNormalization plugin. It creates a serialized version, but it fails when deserializing the model while trying to load cudnn 8 instead of cudnn 9.
Environment
Using docker: nvcr.io/nvidia/tensorrt:24.05-py3
TensorRT Version: 10.0.1
NVIDIA GPU: A4000
NVIDIA Driver Version: 550.67
CUDA Version: 12.4
CUDNN Version: 9.1 (per container documentation)
Operating System:
Python Version (if applicable): -
Tensorflow Version (if applicable): -
PyTorch Version (if applicable): -
Baremetal or Container (if so, version): nvcr.io/nvidia/tensorrt:24.05-py3
Relevant Files
Model link: https://drive.google.com/file/d/1vmGZpWJ_1sfz2ejbZoO3fFaR5udxOLTi/view?usp=sharing
Steps To Reproduce
[06/17/2024-14:57:28] [E] [TRT] std::exception [06/17/2024-14:57:28] [F] [TRT] Validation failed: Failed to load libcudnn.so.8. plugin/common/cudnnWrapper.cpp:90
[06/17/2024-14:57:28] [E] [TRT] std::exception [06/17/2024-14:57:28] [F] [TRT] Validation failed: Failed to load libcudnn.so.8. plugin/common/cudnnWrapper.cpp:90
[06/17/2024-14:57:28] [E] [TRT] std::exception [06/17/2024-14:57:28] [F] [TRT] Validation failed: Failed to load libcudnn.so.8. plugin/common/cudnnWrapper.cpp:90
[06/17/2024-14:57:28] [E] [TRT] std::exception [06/17/2024-14:57:28] [F] [TRT] Validation failed: Failed to load libcudnn.so.8. plugin/common/cudnnWrapper.cpp:90
... [06/17/2024-14:57:28] [E] [TRT] std::exception [06/17/2024-14:57:28] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +156, now: CPU 1, GPU 199 (MiB) [06/17/2024-14:57:28] [I] Setting persistentCacheLimit to 0 bytes. [06/17/2024-14:57:28] [I] Created execution context with device memory size: 155.537 MiB [06/17/2024-14:57:28] [I] Using random values for input images [06/17/2024-14:57:28] [I] Input binding for images with dimensions 1x500x1000x3 is created. [06/17/2024-14:57:28] [I] Output binding for class_heatmaps with dimensions 1x5x125x250 is created. [06/17/2024-14:57:28] [I] Starting inference [06/17/2024-14:57:28] [F] [TRT] Validation failed: mBnScales != nullptr && mBnScales->mPtr != nullptr plugin/groupNormalizationPlugin/groupNormalizationPlugin.cpp:132
[06/17/2024-14:57:28] [E] [TRT] std::exception [06/17/2024-14:57:28] [E] Error[2]: [pluginV2DynamicExtRunner.cpp::execute::115] Error Code 2: Internal Error (Assertion pluginUtils::isSuccess(status) failed. ) [06/17/2024-14:57:28] [E] Error occurred during inference