NVIDIA / TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
https://developer.nvidia.com/tensorrt
Apache License 2.0
10.81k stars 2.13k forks source link

Transformer-like model hurts accuracy converted from onnx(opset16) when using fp16 in TensorRT-8.6, and TensorRT-8.6 cannot parse onnx(opset17) because of LayerNormalization/LayerNorm #3657

Open miraiaroha opened 9 months ago

miraiaroha commented 9 months ago

Description

I want deploy my transformer-like detection model in TensorRT-8.6(I can only choose TensorRT-8.6 because of its flash attention support): (i) firstly, I generated engines from onnx-opset16 and evaluated it, with results below: | model | onnx-op16-fp32 | onnx-op16-fp16 | trt-op16-fp32 | trt-op16-fp16 | trt-op16-fp16-int8 | | mAP | 43.8 | 43.8 | 42.3 | 23.4 | 23.7 | trt-op16-fp32 drop slightly but trt-op16-fp16 almost not work!

(ii) secondly, I tried onnx-opset17 as in tensorRT-86 release note mentioned For networks containing normalization layers, particularly if deploying with mixed precision, target the latest ONNX opset that contains the corresponding function ops, for example: opset 17 for LayerNormalization or opset 18 GroupNormalization. Numerical accuracy using function ops is superior to corresponding implementation with primitive ops for normalization layers. But I found TensorRT-8.6 cannot parse LayerNormalization, some evaluation results and log below: | model | onnx-op17-fp32 | onnx-op17-fp16 | | mAP | 43.8 | 4.5 | onnx-op17-fp16 almost not work!

tensorrtRT log: [02/07/2024-07:37:01] [I] [TRT] [MemUsageChange] Init CUDA: CPU +11, GPU +0, now: CPU 35, GPU 7970 (MiB) [02/07/2024-07:37:01] [V] [TRT] Trying to load shared library libnvinfer_builder_resource.so.8.6.2 [02/07/2024-07:37:01] [V] [TRT] Loaded shared library libnvinfer_builder_resource.so.8.6.2 [02/07/2024-07:37:07] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +1154, GPU +1310, now: CPU 1225, GPU 9325 (MiB) [02/07/2024-07:37:07] [V] [TRT] CUDA lazy loading is enabled. [02/07/2024-07:37:08] [I] [TRT] ---------------------------------------------------------------- [02/07/2024-07:37:08] [I] [TRT] Input filename: ./dinov2det/dinov2-small-rtdetr-966-546-op17-ep351-sim.onnx [02/07/2024-07:37:08] [I] [TRT] ONNX IR version: 0.0.8 [02/07/2024-07:37:08] [I] [TRT] Opset version: 17 [02/07/2024-07:37:08] [I] [TRT] Producer name: pytorch [02/07/2024-07:37:08] [I] [TRT] Producer version: 2.0.0 [02/07/2024-07:37:08] [I] [TRT] Domain:
[02/07/2024-07:37:08] [I] [TRT] Model version: 0 [02/07/2024-07:37:08] [I] [TRT] Doc string:
................ ............... [02/07/2024-07:39:32] [V] [TRT] /model/backbone/Add [Add] outputs: [/model/backbone/Add_output_0 -> (1, 2692, 384)[FLOAT]], [02/07/2024-07:39:32] [V] [TRT] Parsing node: /model/backbone/blocks.0/norm1/LayerNormalization [LayerNormalization] [02/07/2024-07:39:32] [V] [TRT] Searching for input: /model/backbone/Add_output_0 [02/07/2024-07:39:32] [V] [TRT] Searching for input: model.backbone.blocks.0.norm1.weight [02/07/2024-07:39:32] [V] [TRT] Searching for input: model.backbone.blocks.0.norm1.bias [02/07/2024-07:39:32] [V] [TRT] /model/backbone/blocks.0/norm1/LayerNormalization [LayerNormalization] inputs: [/model/backbone/Add_output_0 -> (1, 2692, 384)[FLOAT]], [model.backbone.blocks.0.norm1.weight -> (384)[FLOAT]], [model.backbone.blocks.0.norm1.bias -> (384)[FLOAT]], [02/07/2024-07:39:32] [I] [TRT] No importer registered for op: LayerNormalization. Attempting to import as plugin. [02/07/2024-07:39:32] [I] [TRT] Searching for plugin: LayerNormalization, plugin_version: 1, plugin_namespace: [02/07/2024-07:39:32] [V] [TRT] Global registry did not find LayerNormalization creator. Will try parent registry if enabled. [02/07/2024-07:39:32] [E] [TRT] 3: getPluginCreator could not find plugin: LayerNormalization version: 1 [02/07/2024-07:39:32] [E] [TRT] ModelImporter.cpp:757: While parsing node number 5 [LayerNormalization -> "/model/backbone/blocks.0/norm1/LayerNormalization_output_0"]: [02/07/2024-07:39:32] [E] [TRT] ModelImporter.cpp:758: --- Begin node --- [02/07/2024-07:39:32] [E] [TRT] ModelImporter.cpp:759: input: "/model/backbone/Add_output_0" input: "model.backbone.blocks.0.norm1.weight" input: "model.backbone.blocks.0.norm1.bias" output: "/model/backbone/blocks.0/norm1/LayerNormalization_output_0" name: "/model/backbone/blocks.0/norm1/LayerNormalization" op_type: "LayerNormalization" attribute { name: "axis" i: -1 type: INT } attribute { name: "epsilon" f: 1e-06 type: FLOAT } doc_string: "/home/qxit02/.conda/envs/dinov2/lib/python3.9/site-packages/torch/nn/functional.py(2515): layer_norm\n/home/qxit02/.conda/envs/dinov2/lib/python3.9/site-packages/torch/nn/modules/normalization.py(190): forward\n/home/qxit02/.conda/envs/dinov2/lib/python3.9/site-packages/torch/nn/modules/module.py(1488): _slow_forward\n/home/qxit02/.conda/envs/dinov2/lib/python3.9/site-packages/torch/nn/modules/module.py(1501): _call_impl\n/home/qxit02/cyr/proj/10.transformer/wrr/rtdetr_pytorch/tools/../src/zoo/rtdetr/layers/block.py(91): attn_residual_func\n/home/qxit02/cyr/proj/10.transformer/wrr/rtdetr_pytorch/tools/../src/zoo/rtdetr/layers/block.py(112): forward\n/home/qxit02/cyr/proj/10.transformer/wrr/rtdetr_pytorch/tools/../src/zoo/rtdetr/layers/block.py(254): forward\n/home/qxit02/.conda/envs/dinov2/lib/python3.9/site-packages/torch/nn/modules/module.py(1488): _slow_forward\n/home/qxit02/.conda/envs/dinov2/lib/python3.9/site-packages/torch/nn/modules/module.py(1501): _call_impl\n/home/qxit02/cyr/proj/10.transformer/wrr/rtdetr_pytorch/tools/../src/zoo/rtdetr/vision_transformer.py(224): forward_features\n/home/qxit02/cyr/proj/10.transformer/wrr/rtdetr_pytorch/tools/../src/zoo/rtdetr/vision_transformer.py(315): forward\n/home/qxit02/.conda/envs/dinov2/lib/python3.9/site-packages/torch/nn/modules/module.py(1488): _slow_forward\n/home/qxit02/.conda/envs/dinov2/lib/python3.9/site-packages/torch/nn/modules/module.py(1501): _call_impl\n/home/qxit02/cyr/proj/10.transformer/wrr/rtdetr_pytorch/tools/../src/zoo/rtdetr/rtdetr.py(169): forward\n/home/qxit02/.conda/envs/dinov2/lib/python3.9/site-packages/torch/nn/modules/module.py(1488): _slow_forward\n/home/qxit02/.conda/envs/dinov2/lib/python3.9/site-packages/torch/nn/modules/module.py(1501): _call_impl\n/home/qxit02/cyr/proj/10.transformer/wrr/rtdetr_pytorch/tools/predict.py(126): forward\n/home/qxit02/.conda/envs/dinov2/lib/python3.9/site-packages/torch/nn/modules/module.py(1488): _slow_forward\n/home/qxit02/.conda/envs/dinov2/lib/python3.9/site-packages/torch/nn/modules/module.py(1501): _call_impl\n/home/qxit02/.conda/envs/dinov2/lib/python3.9/site-packages/torch/jit/_trace.py(118): wrapper\n/home/qxit02/.conda/envs/dinov2/lib/python3.9/site-packages/torch/jit/_trace.py(127): forward\n/home/qxit02/.conda/envs/dinov2/lib/python3.9/site-packages/torch/nn/modules/module.py(1501): _call_impl\n/home/qxit02/.conda/envs/dinov2/lib/python3.9/site-packages/torch/jit/_trace.py(1268): _get_trace_graph\n/home/qxit02/.conda/envs/dinov2/lib/python3.9/site-packages/torch/onnx/utils.py(893): _trace_and_get_graph_from_model\n/home/qxit02/.conda/envs/dinov2/lib/python3.9/site-packages/torch/onnx/utils.py(989): _create_jit_graph\n/home/qxit02/.conda/envs/dinov2/lib/python3.9/site-packages/torch/onnx/utils.py(1113): _model_to_graph\n/home/qxit02/.conda/envs/dinov2/lib/python3.9/site-packages/torch/onnx/utils.py(1548): _export\n/home/qxit02/.conda/envs/dinov2/lib/python3.9/site-packages/torch/onnx/utils.py(506): export\n/home/qxit02/cyr/proj/10.transformer/wrr/rtdetr_pytorch/tools/predict.py(195): main\n/home/qxit02/cyr/proj/10.transformer/wrr/rtdetr_pytorch/tools/predict.py(264): \n"

[02/07/2024-07:39:32] [E] [TRT] ModelImporter.cpp:760: --- End node --- [02/07/2024-07:39:32] [E] [TRT] ModelImporter.cpp:762: ERROR: builtin_op_importers.cpp:5435 In function importFallbackPluginImporter: [8] Assertion failed: creator && "Plugin not found, are the plugin name, version, and namespace correct?" parse onnx file fail ...

Environment

Hardware: Orin NX 16G

TensorRT Version: 8.6

CUDA Version: 12.2

Docker: dustynv/l4t-pytorch:r36.2.0

Operating System: Ubuntu

miraiaroha commented 9 months ago

I found my onnx/onnx-tensorrt version is wrong, and convert engines from onnx(opset17) successfully with the right version, but the accurary is still bad: | model | trt-op17-fp32 | trt-op17-fp16 | | mAP | 42.3 | 23.7 |

zerollzeng commented 9 months ago

Could you please try TRT 9.3/9.2? Thanks!

eduardatmadenn commented 8 months ago

I found my onnx/onnx-tensorrt version is wrong, and convert engines from onnx(opset17) successfully with the right version, but the accurary is still bad: | model | trt-op17-fp32 | trt-op17-fp16 | | mAP | 42.3 | 23.7 |

Could you share more details on this ? I am facing the same issue, using opset17. and TensorRT 8.6.1. Model was converted using torch.onnx, torch version 2.1.0

miraiaroha commented 8 months ago

I found my onnx/onnx-tensorrt version is wrong, and convert engines from onnx(opset17) successfully with the right version, but the accurary is still bad: | model | trt-op17-fp32 | trt-op17-fp16 | | mAP | 42.3 | 23.7 |

Could you share more details on this ? I am facing the same issue, using opset17. and TensorRT 8.6.1. Model was converted using torch.onnx, torch version 2.1.0

You should git clone the latest onnx and onnx-tensorrt and compile them in your environment. This way your tensorrt program can recognize the LayerNormalization in op17.

And, now I am using Polygraphy(--trt-npps) to dig into which layers to lost precisions.

miraiaroha commented 8 months ago

Could you please try TRT 9.3/9.2? Thanks!

I have tried TRT 9.2, the results are still bad, but TRT 8.5(without flash attention) has good performance, as below: | model | trt-fp32 | trt8.5-fp16 | trt8.6-fp16 | trt9.2-fp16 | mAP | 42.3 | 42.1 | 3.1 | 3.1 |

And I use Polygraphy to progressively inscrese the scope of fp16 precision, and find that the precision loss is in self-attention. But why TRT 8.5 is OK?

zerollzeng commented 8 months ago

Does it pass with polygraphy? e.g. polygraphy run model.onnx --trt --onnxrt

miraiaroha commented 8 months ago

Does it pass with polygraphy? e.g. polygraphy run model.onnx --trt --onnxrt

The problem of LayerNormalization in opset17 cannot be converted to engine has been solved and I found that the accuracy loss is not concern with the opset version. So I tested TRT8.5\TRT8.6\TRT9.2 with opset16 model for comparisons (use Polygraphy to convert model), observing seriously accuracy loss of FP16 in TRT8.6\TRT9.2, the bad layers are the self-attentons of RTDETR-decoder in my model, as below 1709206484586

eduardatmadenn commented 8 months ago

Does it pass with polygraphy? e.g. polygraphy run model.onnx --trt --onnxrt

The problem of LayerNormalization in opset17 cannot be converted to engine has been solved and I found that the accuracy loss is not concern with the opset version. So I tested TRT8.5\TRT8.6\TRT9.2 with opset16 model for comparisons (use Polygraphy to convert model), observing seriously accuracy loss of FP16 in TRT8.6\TRT9.2, the bad layers are the self-attentons of RTDETR-decoder in my model, as below 1709206484586

I know I asked before, sorry if I'm pushy, but I cannot figure out what am I doing wrong. I have generated a model with the latest patch of pytorch, which uses onnx 1.15, with opset 17. I am using TensorRT 8.6.1 to serialize the onnx model but I get the same error. Do I have to build it again with special plugins, or do I need to use onnx 1.16 ? Any suggestions ?

miraiaroha commented 8 months ago

Does it pass with polygraphy? e.g. polygraphy run model.onnx --trt --onnxrt

The problem of LayerNormalization in opset17 cannot be converted to engine has been solved and I found that the accuracy loss is not concern with the opset version. So I tested TRT8.5\TRT8.6\TRT9.2 with opset16 model for comparisons (use Polygraphy to convert model), observing seriously accuracy loss of FP16 in TRT8.6\TRT9.2, the bad layers are the self-attentons of RTDETR-decoder in my model, as below 1709206484586

I know I asked before, sorry if I'm pushy, but I cannot figure out what am I doing wrong. I have generated a model with the latest patch of pytorch, which uses onnx 1.15, with opset 17. I am using TensorRT 8.6.1 to serialize the onnx model but I get the same error. Do I have to build it again with special plugins, or do I need to use onnx 1.16 ? Any suggestions ?

In TensorRT 8.6 you don't need to insert special plugins to support LayerNorm, which can be converted OOTB in opset17.

The conversion flow is onnx-file -> onnx-proto-parse -> onnx-to-tensorrt -> tensorrt-file.

That is to say, TensorRT-8.6 can recognize [op_type: "LayerNormalization"], but if your middleware onnx-parser and onnx-tensorrt(wrong versions) can't recognize it, you will get the op_type error.

My environment is TensorRT-8.6.2, onnx-1.16, onnxparser-8.6.1, onnx model in opset17. Maybe is helpful for you.

eduardatmadenn commented 8 months ago

Does it pass with polygraphy? e.g. polygraphy run model.onnx --trt --onnxrt

The problem of LayerNormalization in opset17 cannot be converted to engine has been solved and I found that the accuracy loss is not concern with the opset version. So I tested TRT8.5\TRT8.6\TRT9.2 with opset16 model for comparisons (use Polygraphy to convert model), observing seriously accuracy loss of FP16 in TRT8.6\TRT9.2, the bad layers are the self-attentons of RTDETR-decoder in my model, as below 1709206484586

I know I asked before, sorry if I'm pushy, but I cannot figure out what am I doing wrong. I have generated a model with the latest patch of pytorch, which uses onnx 1.15, with opset 17. I am using TensorRT 8.6.1 to serialize the onnx model but I get the same error. Do I have to build it again with special plugins, or do I need to use onnx 1.16 ? Any suggestions ?

In TensorRT 8.6 you don't need to insert special plugins to support LayerNorm, which can be converted OOTB in opset17.

The conversion flow is onnx-file -> onnx-proto-parse -> onnx-to-tensorrt -> tensorrt-file.

That is to say, TensorRT-8.6 can recognize [op_type: "LayerNormalization"], but if your middleware onnx-parser and onnx-tensorrt(wrong versions) can't recognize it, you will get the op_type error.

My environment is TensorRT-8.6.2, onnx-1.16, onnxparser-8.6.1, onnx model in opset17. Maybe is helpful for you.

yes. that makes it very clear. Thank you for your patience