NVIDIA / TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
https://developer.nvidia.com/tensorrt
Apache License 2.0
10.67k stars 2.12k forks source link

Implications of converting onnx fp16-> TensorRT engine #3057

Closed fujikosu closed 1 year ago

fujikosu commented 1 year ago

I'm using a onnx model (yolox_s.onnx) from https://github.com/Megvii-BaseDetection/YOLOX I wanted to reduce the model size to transfer to an edge device (Nvidia Xavier NX) that I converted this onnx fp32 model to fp16 onnx model by this library (https://github.com/microsoft/onnxconverter-common) introduced in this onnx doc (https://onnxruntime.ai/docs/performance/model-optimizations/float16.html) like below and saved it as yolox_s_fp16.onnx

from onnxconverter_common import float16
model_fp16 = float16.convert_float_to_float16(model)

That reduced the model size to nearly half.

-rw-r--r-- 1 vscode vscode 35M Jun 12 10:51 yolox_s.onnx
-rw-r--r-- 1 vscode vscode 18M Jun  6 09:10 yolox_s_fp16.onnx

I got two questions.

  1. Can we use this fp16 onnx model as input to TensorRT engine conversion? Conversion itself succeed in both fp16 onnx->TensorRT engine without --fp16 and fp16 onnx->TensorRT with --fp16 but is this officially supported in TensorRT? Or are there any significant accuracy drop etc that you discourage users to do so?
  2. If this fp16 onnx model is supported as input, fp16 onnx model to tensorRT engine conversion without --fp16 results in the same engine file size with using fp32 onnx input. I was expecting that fp16 onnx -> tensorRT without --fp16 would have the same size with fp16 onnx -> tensorRT with --fp16 as the input was already in fp16 but is this expected?
# fp32 onnx -> tensorRT without --fp16
-rw-r--r-- 1 root   root   47M Jun  6 09:07 yolox_s.trt

# fp16 onnx -> tensorRT without --fp16
-rw-r--r-- 1 root   root   47M Jun  6 09:13 yolox_s_fp16in_fp32out.trt
# fp16 onnx -> tensorRT with --fp16
-rw-r--r-- 1 root   root   26M Jun  6 09:31 yolox_s_fp16in_fp16out.trt
zerollzeng commented 1 year ago
# fp32 onnx -> tensorRT without --fp16
-rw-r--r-- 1 root   root   47M Jun  6 09:07 yolox_s.trt

# fp16 onnx -> tensorRT without --fp16
-rw-r--r-- 1 root   root   47M Jun  6 09:13 yolox_s_fp16in_fp32out.trt
# fp16 onnx -> tensorRT with --fp16
-rw-r--r-- 1 root   root   26M Jun  6 09:31 yolox_s_fp16in_fp16out.trt

It's expected, The trt engine size is depends on the precision you set but not the original onnx precision(even it's a fp16 onnx already, without --fp16 set then trt will still build the fp32 engine)

fujikosu commented 1 year ago

Thank you for your answer on the question 2.

What about question 1?

Can we use this fp16 onnx model as input to TensorRT engine conversion? Conversion itself succeed in both fp16 onnx->TensorRT engine without --fp16 and fp16 onnx->TensorRT with --fp16 but is this officially supported in TensorRT? Or are there any significant accuracy drop etc that you discourage users to do so?

zerollzeng commented 1 year ago

Can we use this fp16 onnx model as input to TensorRT engine conversion?

Yes, but usually you just feed an FP32 onnx and specify --fp16 should also be fine. unless the weights in your model exceed FP16 range, then it may cause accuracy drop.

fujikosu commented 1 year ago

Thank you for the clarification!