Lower Performance with Yolov7-Tiny Quantization

farukoruc commented 1 year ago

Hi, After doing quantization on the yolov7-tiny model with the recommended settings, I am getting a lower throughput with the result (qat-tiny.pt) on the benchmark when compared to doing the same benchmark on the model before quantization (yolov7-tiny.pt). I double checked that there were no tasks running in the background so I am pretty sure it is caused by the weights. I was wondering what went wrong during the quantization. Thanks!

farukoruc commented 1 year ago

Here are more details in case it helps. The command I used to generate tiny qat: python scripts/qat.py quantize yolov7-tiny.pt --qat=qat.pt --ptq=ptq.pt --ignore-policy="model\.77\.m\.(.*)|model\.0\.(.*)" --supervision-stride=1 --eval-ptq --eval-origin The command I used for the benchmark: /usr/src/tensorrt/bin/trtexec --onnx=tiny-qat.onnx --int8 --fp16 --workspace=4096 --minShapes=images:4x3x640x640 --optShapes=images:4x3x640x640 --maxShapes=images:4 The performance summary for the tiny model before qat: yolov7-tiny The performance summary for the tiny model after qat: yolov7-tiny-qat

farukoruc commented 1 year ago

It seems like I was comparing against default txtexec INT8 mode. After comparing with the fp16, it made much more sense.

NVIDIA-AI-IOT / yolo_deepstream

Lower Performance with Yolov7-Tiny Quantization #43