Closed farukoruc closed 1 year ago
Here are more details in case it helps.
The command I used to generate tiny qat:
python scripts/qat.py quantize yolov7-tiny.pt --qat=qat.pt --ptq=ptq.pt --ignore-policy="model\.77\.m\.(.*)|model\.0\.(.*)" --supervision-stride=1 --eval-ptq --eval-origin
The command I used for the benchmark:
/usr/src/tensorrt/bin/trtexec --onnx=tiny-qat.onnx --int8 --fp16 --workspace=4096 --minShapes=images:4x3x640x640 --optShapes=images:4x3x640x640 --maxShapes=images:4
The performance summary for the tiny model before qat:
The performance summary for the tiny model after qat:
It seems like I was comparing against default txtexec INT8 mode. After comparing with the fp16, it made much more sense.
Hi, After doing quantization on the yolov7-tiny model with the recommended settings, I am getting a lower throughput with the result (qat-tiny.pt) on the benchmark when compared to doing the same benchmark on the model before quantization (yolov7-tiny.pt). I double checked that there were no tasks running in the background so I am pretty sure it is caused by the weights. I was wondering what went wrong during the quantization. Thanks!