NVIDIA / TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
https://developer.nvidia.com/tensorrt
Apache License 2.0
10.68k stars 2.12k forks source link

Set layernorm to fp32 failure of TensorRT v8.6.11 when running trtexec on GPU A100 #3889

Closed jibf closed 3 months ago

jibf commented 4 months ago

Description

I tried to convert an onnx to trt on A100 with layernorm set fp32 specificly. But the whole transformer block was wrapped into a Myelin layer in which final precision of layernorm was still fp16.

image

detailed log: cvt.log

Environment

TensorRT Version: TensorRT v8611

NVIDIA GPU: NVIDIA A100-SXM4-40GB

NVIDIA Driver Version:

CUDA Version:

CUDNN Version:

Operating System:

Python Version (if applicable):

Tensorflow Version (if applicable):

PyTorch Version (if applicable):

Baremetal or Container (if so, version):

Relevant Files

Model link:

Steps To Reproduce

Commands or scripts: trtexec --onnx=model/nvln.onnx --fp16 --noTF32 --saveEngine=model/nvln.exec.fp16.trt --layerPrecisions=LayerNormalization_*:fp32,Softmax_*:fp32,Conv_0:fp32 --layerOutputTypes=LayerNormalization_*:fp32,Softmax_*:fp32,Conv_0:fp32 --precisionConstraints=obey --timingCacheFile=x86.tc --exportLayerInfo=nvln.fp16.json --exportProfile=nvln.fp16.profile.json --profilingVerbosity=detailed --dumpProfile --verbose

Have you tried the latest release?:

Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt):

Wuqiman commented 4 months ago

Using our layernorm fp32 plugin, the precision of model is normal. Using fp16 myelin, the precision drops more than 20%+.

zerollzeng commented 4 months ago

--layerPrecisions=LayerNormalization_*:fp32

Could you please try expand the "*", I'm not sure whether we support this kind of wildcard. You should be able to find the FP16 layernorm name from verbose log, which trt should give you a warning that running layernorm under fp16 will affect accuracy.

ttyio commented 3 months ago

closing since no activity for more than 3 weeks, pls reopen if you still have question, thanks all!

Smarter-version commented 2 months ago

I'm having the same problem. Have you solved it?

Smarter-version commented 2 months ago

I want to modify the precision of the Add_3244 node to fp32, but it's wrapped in Myelin. Commands or scripts: trtexec \ --onnx=$onnx_path \ --saveEngine=$engine_path \ --plugins=$plugins_path \ --verbose --workspace=2048 \ --exportProfile=${engine_path}.profile.json \ --exportLayerInfo=${engine_path}.graph.json \ --profilingVerbosity=detailed \ --fp16 \ --precisionConstraints=obey \ --layerPrecisions=Add_3244:fp32 --layerOutputTypes=Add_3244:fp32 @zerollzeng

jibf commented 2 months ago

I want to modify the precision of the Add_3244 node to fp32, but it's wrapped in Myelin. Commands or scripts: trtexec --onnx=$onnx_path --saveEngine=$engine_path --plugins=$plugins_path --verbose --workspace=2048 --exportProfile=${engine_path}.profile.json --exportLayerInfo=${engine_path}.graph.json --profilingVerbosity=detailed --fp16 --precisionConstraints=obey --layerPrecisions=Add_3244:fp32 --layerOutputTypes=Add_3244:fp32 @zerollzeng

@Smarter-version Perhaps you can try setting specific layers as output layers. Since TRT's output layer must be fp32, this indirectly achieves the goal of setting these layers to fp32.

Smarter-version commented 2 months ago

I want to modify the precision of the Add_3244 node to fp32, but it's wrapped in Myelin. Commands or scripts: trtexec --onnx=$onnx_path --saveEngine=$engine_path --plugins=$plugins_path --verbose --workspace=2048 --exportProfile=${engine_path}.profile.json --exportLayerInfo=${engine_path}.graph.json --profilingVerbosity=detailed --fp16 --precisionConstraints=obey --layerPrecisions=Add_3244:fp32 --layerOutputTypes=Add_3244:fp32 @zerollzeng

@Smarter-version Perhaps you can try setting specific layers as output layers. Since TRT's output layer must be fp32, this indirectly achieves the goal of setting these layers to fp32.

Thank you for your reply!I tried to use the output of some layers as the output of onnx model and then convert it to engine model, but it fails with the error "No value info found for tensor".