GroupNormalization plugin failure of TensorRT 10.0.1.6 when running trtexec on GPU A4000

appearancefnp commented 2 weeks ago

Description

Hey guys! I wanted to upgrade from TensorRT 8.6 to 10.0. I have a ONNX model that contains GroupNormalization plugin. It creates a serialized version, but it fails when deserializing the model while trying to load cudnn 8 instead of cudnn 9.

Environment

Using docker: nvcr.io/nvidia/tensorrt:24.05-py3

TensorRT Version: 10.0.1

NVIDIA GPU: A4000

NVIDIA Driver Version: 550.67

CUDA Version: 12.4

CUDNN Version: 9.1 (per container documentation)

Operating System:

Python Version (if applicable): -

Tensorflow Version (if applicable): -

PyTorch Version (if applicable): -

Baremetal or Container (if so, version): nvcr.io/nvidia/tensorrt:24.05-py3

Relevant Files

Model link: https://drive.google.com/file/d/1vmGZpWJ_1sfz2ejbZoO3fFaR5udxOLTi/view?usp=sharing

Steps To Reproduce

Run trtexec: trtexec --onnx=model.onnx

trtexec builds the engine


...
[06/17/2024-14:57:28] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 3 MiB, GPU 1984 MiB
[06/17/2024-14:57:28] [I] [TRT] [MemUsageStats] Peak memory usage during Engine building and serialization: CPU: 3059 MiB
[06/17/2024-14:57:28] [I] Engine built in 886.712 sec.
[06/17/2024-14:57:28] [I] Created engine with size: 55.3649 MiB
[06/17/2024-14:57:28] [I] [TRT] Loaded engine size: 55 MiB
[06/17/2024-14:57:28] [I] Engine deserialized in 0.0301295 sec.
[06/17/2024-14:57:28] [F] [TRT] Validation failed: Failed to load libcudnn.so.8.
plugin/common/cudnnWrapper.cpp:90

[06/17/2024-14:57:28] [E] [TRT] std::exception [06/17/2024-14:57:28] [F] [TRT] Validation failed: Failed to load libcudnn.so.8. plugin/common/cudnnWrapper.cpp:90

... [06/17/2024-14:57:28] [E] [TRT] std::exception [06/17/2024-14:57:28] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +156, now: CPU 1, GPU 199 (MiB) [06/17/2024-14:57:28] [I] Setting persistentCacheLimit to 0 bytes. [06/17/2024-14:57:28] [I] Created execution context with device memory size: 155.537 MiB [06/17/2024-14:57:28] [I] Using random values for input images [06/17/2024-14:57:28] [I] Input binding for images with dimensions 1x500x1000x3 is created. [06/17/2024-14:57:28] [I] Output binding for class_heatmaps with dimensions 1x5x125x250 is created. [06/17/2024-14:57:28] [I] Starting inference [06/17/2024-14:57:28] [F] [TRT] Validation failed: mBnScales != nullptr && mBnScales->mPtr != nullptr plugin/groupNormalizationPlugin/groupNormalizationPlugin.cpp:132

[06/17/2024-14:57:28] [E] [TRT] std::exception [06/17/2024-14:57:28] [E] Error[2]: [pluginV2DynamicExtRunner.cpp::execute::115] Error Code 2: Internal Error (Assertion pluginUtils::isSuccess(status) failed. ) [06/17/2024-14:57:28] [E] Error occurred during inference



**Commands or scripts**:  
trtexec --onnx=model.onnx

**Have you tried [the latest release](https://developer.nvidia.com/tensorrt)?**: yes

lix19937 commented 1 week ago

Can you upload full log with trtexec --onnx=model.onnx --verbose ?

appearancefnp commented 1 week ago

@lix19937 trtexec.log

lix19937 commented 1 week ago

[06/17/2024-14:57:28] [E] [TRT] std::exception [06/17/2024-14:57:28] [F] [TRT] Validation failed: Failed to load libcudnn.so.8. plugin/common/cudnnWrapper.cpp:90

Make sure libcudnn.so load successed. Add path to LD_LIBRARY_PATH.

appearancefnp commented 2 days ago

[06/17/2024-14:57:28] [E] [TRT] std::exception [06/17/2024-14:57:28] [F] [TRT] Validation failed: Failed to load libcudnn.so.8. plugin/common/cudnnWrapper.cpp:90

Make sure libcudnn.so load successed. Add path to LD_LIBRARY_PATH.

The problem is that the NVIDIA container contains cudnn 9.1.0, but the plugin is trying to load libcudnn.so.8. There is a version mismatch, not that cudnn is not available.

lix19937 commented 2 days ago

You should make sure your env has one cudnn, and why your nvinfer plugin will load cudnn.8.0 ?

appearancefnp commented 1 day ago

This is not my plugin - this is the plugin provided in this repo - https://github.com/NVIDIA/TensorRT/tree/release/10.1/plugin/groupNormalizationPlugin

And it loads cudnn 8, not 9 because it has the wrong macro defined here: https://github.com/NVIDIA/TensorRT/blob/release/10.1/plugin/common/cudnnWrapper.cpp#L26

lix19937 commented 1 day ago

From https://github.com/NVIDIA/TensorRT/tree/release/10.0, trt version 10.0.1.6, cudnn recommend follow

TensorRT GA build

TensorRT v10.0.1.6 Available from direct download links listed below System Packages

CUDA Recommended versions: cuda-12.2.0 + cuDNN-8.9 cuda-11.8.0 + cuDNN-8.9 GNU make >= v4.1 cmake >= v3.13 python >= v3.8, <= v3.10.x pip >= v19.0 Essential utilities git, pkg-config, wget

map to https://github.com/NVIDIA/TensorRT/blob/release/10.1/plugin/common/cudnnWrapper.cpp#L26-L42

You can try to creat a soft link ln -s libcudnn.so.9 libcudnn.so.8.

appearancefnp commented 2 hours ago

Why does the container include cudnn 9 then? https://docs.nvidia.com/deeplearning/tensorrt/container-release-notes/index.html#rel-24-06

If TensorRT doesn't work in an NVIDIA container with cudnn 9, why does it ship with it?

NVIDIA / TensorRT