NVIDIA / TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
https://developer.nvidia.com/tensorrt
Apache License 2.0
10.65k stars 2.12k forks source link

trtexec error: IPluginRegistry::getCreator: Error Code 4: Cannot find plugin: grid_sampler, version: 1 #4160

Open kelkarn opened 2 weeks ago

kelkarn commented 2 weeks ago

I see the following error when I run my trtexec command:

trtexec --onnx=/azureuser/end2end_ep19.onnx --saveEngine=/azureuser/end2end_ep19.plan \
--useCudaGraph \
--plugins=/opt/tritonserver/backends/onnxruntime/libmmdeploy_onnxruntime_ops.so \
--verbose

from within the container:

[09/25/2024-17:56:30] [I] [TRT] No checker registered for op: grid_sampler. Attempting to check as plugin.
[09/25/2024-17:56:30] [V] [TRT] Local registry did not find grid_sampler creator. Will try parent registry if enabled.
[09/25/2024-17:56:30] [E] [TRT] IPluginRegistry::getCreator: Error Code 4: API Usage Error (Cannot find plugin: grid_sampler, version: 1, namespace:.)

This error is followed by a bunch of errors on the Unsqueeze node like so:

[09/25/2024-17:56:30] [E] [TRT] ModelImporter.cpp:949: While parsing node number 8539 [grid_sampler -> "onnx::Unsqueeze_10031"]:
[09/25/2024-17:56:30] [E] [TRT] ModelImporter.cpp:950: --- Begin node ---
input: "value_l_"
input: "sampling_grid_l_"
output: "onnx::Unsqueeze_10031"
name: "grid_sampler_8539"
op_type: "grid_sampler"
attribute {
  name: "align_corners"
  i: 0
  type: INT
}
attribute {
  name: "interpolation_mode"
  i: 0
  type: INT
}
attribute {
  name: "padding_mode"
  i: 0
  type: INT
}
domain: "mmdeploy"

[09/25/2024-17:56:30] [E] [TRT] ModelImporter.cpp:951: --- End node ---
[09/25/2024-17:56:30] [E] [TRT] ModelImporter.cpp:954: ERROR: onnxOpCheckers.cpp:781 In function checkFallbackPluginImporter:
[6] creator && "Plugin not found, are the plugin name, version, and namespace correct?"

The model here is a DINO model in ONNX format converted to ONNX using MMDeploy, and a custom op. The custom op symbol in libmmdeploy_onnxruntime_ops.so uses libonnxruntime.so.1.15.1 which I have also copied into the Docker container, and added to my LD_LIBRARY_PATH. I am using the nvcr.io/nvidia/tensorrt:24.08-py3 Docker image and the trtexec binary built with TensorRT 10.3.

I found this other similar issue: https://github.com/onnx/onnx-tensorrt/issues/800

The conclusion there was, that TensorRT does not support the Round operation yet. Is that the same conclusion here? I.e. the grid_sampler operation is not supported in TensorRT yet? There's an issue for this too that I found (Issue#2612) that was marked 'Closed', but it looks like my issue is the exact same here.

kelkarn commented 2 weeks ago

I also noticed these Execution Provider requirements listed on the ONNX runtime webpage:

Based on these, it looks like the TensorRTExecutionProvider and CUDAExecutionProvider with ONNX runtime 1.15.1 require CUDA 11.8 and TensorRT 8.6. Whereas the tensorrt:24.08-py3 Docker image I am using to convert it to a TRT plan comes with CUDA 12.6 and TensorRT 10.3. Because of this, I am not able to load the ONNX model within the Docker image within a Python InferenceSession with these execution providers either.

Does this mean that getting the ONNX model converted to a TRT plan with the latest TRT version 10.3 but a custom op built with ONNX runtime 1.15.1 is just not going to be possible? Or is there a way to achieve this?

lix19937 commented 2 weeks ago

This error is followed by a bunch of errors on the Unsqueeze node like so

No, it due to grid_sampler.

Try to


export LD_LIBRARY_PATH=/opt/tritonserver/backends/onnxruntime/libmmdeploy_onnxruntime_ops.so:$LD_LIBRARY_PATH

then rerun.

kelkarn commented 2 weeks ago

@lix19937 - that did not work. I see the same error:

[09/26/2024-17:35:33] [V] [TRT] Static check for parsing node: grid_sampler_8539 [grid_sampler]
[09/26/2024-17:35:33] [I] [TRT] No checker registered for op: grid_sampler. Attempting to check as plugin.
[09/26/2024-17:35:33] [V] [TRT] Local registry did not find grid_sampler creator. Will try parent registry if enabled.
[09/26/2024-17:35:33] [E] [TRT] IPluginRegistry::getCreator: Error Code 4: API Usage Error (Cannot find plugin: grid_sampler, version: 1, namespace:.)
lix19937 commented 2 weeks ago

Another WBR, you can use torch.nn.functional.grid_sample to repalce the mm version, now grid sample is a built-in layer in trt8.6, so you donot load plugin. @kelkarn

lix19937 commented 2 weeks ago

And please make sure you are using the latest Opset version 17 to export onnx.