createEngine failure of TensorRT 8.6

NVIDIA / TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

https://developer.nvidia.com/tensorrt

Apache License 2.0

10.55k stars 2.1k forks source link

createEngine failure of TensorRT 8.6 #3377

Open monsterlyg opened 11 months ago

monsterlyg commented 11 months ago

Description

When converting an onnx DETR-series model with Swin-large backbone to tensorrt engine, AssertionError: command: trtexec --onnx=model.onnx --workspace=8192 --fp16 --plugins=libmmdeploy_tensorrt_ops.so --saveEngine=model.plan --device=5

Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[/backbone/stages.0/blocks.1/attn/Constant_43_output_0 + (Unnamed Layer* 458) [Shuffle].../backbone/Reshape_3 + /backbone/Transpose_3]}

Environment

TensorRT Version: 8.6

NVIDIA GPU: T4

NVIDIA Driver Version: 460.32.03

CUDA Version: 11.6

Relevant Files

Model link: https://drive.google.com/file/d/1sU_OoV2FEThnxkJz9lMltueZEkR7iNdR/view?usp=drive_link

zerollzeng commented 11 months ago

Requested access, please also provide the libmmdeploy_tensorrt_ops.so. and please also provide a full log with --verbose enabled. Thanks!

monsterlyg commented 11 months ago

libmmdeploy_tensorrt_ops.so @zerollzeng Add one thing, the onnx can be converted to engine by FP32. So it is not urgent.

zerollzeng commented 11 months ago

NVIDIA Driver Version: 460.32.03

I just found that your driver version is very old which might be the reason, could you please try upgrade to 525+? Thanks!

monsterlyg commented 11 months ago

The driver 460 is satisfied with cuda11.x. It is not easy to upgrade the driver for me cause that's on a public server.

---Original--- From: "Zero @.> Date: Wed, Oct 18, 2023 22:12 PM To: @.>; Cc: @.**@.>; Subject: Re: [NVIDIA/TensorRT] createEngine failure of TensorRT 8.6 (Issue#3377)

NVIDIA Driver Version: 460.32.03

I just found that your driver version is very old which might be the reason, could you please try upgrade to 525+? Thanks!

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

monsterlyg commented 10 months ago

Hello, I have tested on A10 with 515 driver. but the same problem occurs. @zerollzeng

zerollzeng commented 10 months ago

Hi @monsterlyg Sorry for the late response, quite busy these days.

Looks like it's a bug. Could you please build the libmmdeploy_tensorrt_ops.so in our container nvcr.io/nvidia/tensorrt:23.06-py3 (23.05+ is fine, see https://docs.nvidia.com/deeplearning/tensorrt/container-release-notes/index.html) and upload it here, because I don't have the env that has TensorRT 8.6 + CUDA 11.6, which would take quite lot of time to setup.

I'll create the internal bug to track this if you can provide the updated lib. Thanks!

monsterlyg commented 10 months ago

@zerollzeng I have found the source repo. But the driver version on my machine is lower than the required. Usually, I have no priviledge to upgrade it. = =！

zerollzeng commented 10 months ago

The link is invalid, could you please provide the original github repo etc?

monsterlyg commented 10 months ago

here. https://github.com/open-mmlab/mmdeploy

zerollzeng commented 10 months ago

I'm struggle to build the plugin lib with our latest TRT 9.1 to see if it reproduce. but it's hard to setup the env :-(

zerollzeng commented 10 months ago

Is it possible to provide a onnx subgraph without relying on the plugin lib? I can see the only needed plugin op is grid_sampler.

monsterlyg commented 10 months ago

Is it possible to provide a onnx subgraph without relying on the plugin lib? I can see the only needed plugin op is grid_sampler.

@zerollzeng I attempted to export only the 'backbone+neck' subgraph to an ONNX file, but to my surprise, it was successfully converted into a TensorRT engine file. Upon attempting to convert the entire graph, the error seems to indicate a failure within the backbone operators. Exporting only the latter part proves to be a bit tricky.