Closed mattherdelma closed 1 year ago
@mattherdelma @lyb36524 He made a FP16 quantification of the ONNX model, through the following instructions:
./trtexec --onnx=yolov8n.onnx --saveEngine=yolov8n_fp16.trt --fp16 --buildOnly --minShapes=images:1x3x640x640 --optShapes=images:4x3x640x640 --maxShapes=images:8x3x640x640
Other instructions are the same as in the link:
The figure below is the time overhead, the test device is Nvidia RTX4090, the BATCH_SIZE = 8:
FP32:
FP16:
More quantitative tutorials, we will update in the future
How to implement yolov8's tensorRT int8 quantization?