Closed Gaozhongpai closed 1 year ago
It looks like you passed --fp16
and --int8
flags at the same time to TRT.
So which mode was it benchmarking?
Can you please clarify what variant of YoloNAS did you export?
Ok, seems like you indeed can pass fp16 and int8 flags simultaneously. Let us check internally what is happening. I will get back once we have more info on this.
Thank you for the speedy response. I am using
yolo_nas_l
. I followed the instruction here: https://docs.deci.ai/super-gradients/documentation/source/BenchmarkingYoloNAS.html#step-1-export-yolonas-to-onnx, which passed--fp16
and--int8
flags at the same time.
In comparison with Yolov7-qat, using the same conversion as trtexec --fp16 --int8 --avgRuns=100 --onnx=yolov7_qat_640.onnx
:
So based on the command trtexec --fp16 --int8 --avgRuns=100 --onnx=yolonas-hand-quan_16x3x640x640_qat.onnx
it looks like you exported and benchmarked model with batch size 16.
So throughput of 11.6 qps is for 16 elements, therefore effective FPS is 11.6 * 16 = 185 fps.
Attaching the docs just in case: https://docs.deci.ai/super-gradients/documentation/source/BenchmarkingYoloNAS.html
Related issues:
So based on the command
trtexec --fp16 --int8 --avgRuns=100 --onnx=yolonas-hand-quan_16x3x640x640_qat.onnx
it looks like you exported and benchmarked model with batch size 16. So throughput of 11.6 qps is for 16 elements, therefore effective FPS is 11.6 * 16 = 185 fps.Attaching the docs just in case: https://docs.deci.ai/super-gradients/documentation/source/BenchmarkingYoloNAS.html
Thank you very much.
On this website: https://docs.deci.ai/super-gradients/documentation/source/BenchmarkingYoloNAS.html#step-1-export-yolonas-to-onnx, it uses batch_size = 32. How can the result have 242.751 qps
?
Not ready to comment on this at the moment, will clarify with the guys from team who did the benchmarks and get back to you once have more information.
💡 Your Question
I followed the colab: https://colab.research.google.com/drive/1yHrHkUR1X2u2FjjvNMfUbSXTkUul6o1P?usp=sharing for Quantization-Aware Finetuning YoloNAS on a Custom Dataset. After that, I converted the qat onnx model to tensorrt as
trtexec --fp16 --int8 --avgRuns=100 --onnx=yolonas-hand-quan_16x3x640x640_qat.onnx
.The result is slower than that reported here:
Versions
No response