Closed fujikosu closed 1 year ago
# fp32 onnx -> tensorRT without --fp16 -rw-r--r-- 1 root root 47M Jun 6 09:07 yolox_s.trt # fp16 onnx -> tensorRT without --fp16 -rw-r--r-- 1 root root 47M Jun 6 09:13 yolox_s_fp16in_fp32out.trt # fp16 onnx -> tensorRT with --fp16 -rw-r--r-- 1 root root 26M Jun 6 09:31 yolox_s_fp16in_fp16out.trt
It's expected, The trt engine size is depends on the precision you set but not the original onnx precision(even it's a fp16 onnx already, without --fp16 set then trt will still build the fp32 engine)
Thank you for your answer on the question 2.
What about question 1?
Can we use this fp16 onnx model as input to TensorRT engine conversion? Conversion itself succeed in both fp16 onnx->TensorRT engine without --fp16 and fp16 onnx->TensorRT with --fp16 but is this officially supported in TensorRT? Or are there any significant accuracy drop etc that you discourage users to do so?
Can we use this fp16 onnx model as input to TensorRT engine conversion?
Yes, but usually you just feed an FP32 onnx and specify --fp16 should also be fine. unless the weights in your model exceed FP16 range, then it may cause accuracy drop.
Thank you for the clarification!
I'm using a onnx model (
yolox_s.onnx
) from https://github.com/Megvii-BaseDetection/YOLOX I wanted to reduce the model size to transfer to an edge device (Nvidia Xavier NX) that I converted this onnx fp32 model to fp16 onnx model by this library (https://github.com/microsoft/onnxconverter-common) introduced in this onnx doc (https://onnxruntime.ai/docs/performance/model-optimizations/float16.html) like below and saved it asyolox_s_fp16.onnx
That reduced the model size to nearly half.
I got two questions.