About converting YOLOv7 QAT model to TensorRT engine(failed for dynamic-batch setting)

NVIDIA-AI-IOT / yolo_deepstream

yolo model qat and deploy with deepstream&tensorrt

Apache License 2.0

549 stars 139 forks source link

When I refer to yolo_deepstream/tree/main/tensorrt_yolov7 and use "yolov7QAT" to perform a batch detection task, the following error occurs ./build/detect --engine=yolov7QAT.engine --img=./imgs/horses.jpg,./imgs/zidane.jpg

Error Message

input 2 images, paths: ./imgs/horses.jpg, ./imgs/zidane.jpg, 
--------------------------------------------------------
Yolov7 initialized from: /opt/nvidia/deepstream/deepstream/samples/models/tao_pretrained_models/yolov7/yolov7QAT.engine
input : images , shape : [ 1,3,640,640,]
output : outputs , shape : [ 1,25200,85,]
--------------------------------------------------------
preprocess start
error cv_img.size() in preProcess
 error: mImgPushed = 1 numImg = 1 mMaxBatchSize= 1, mImgPushed + numImg > mMaxBatchSize 
inference start
postprocessing start
detectec image written to: ./imgs/horses.jpgdetect0.jpg

Note

It works fine when running a single detection task with "yolov7QAT.engine".
"yolov7QAT.engine" comes from "yolov7qat.onnx" conversion.
Whether downloaded from here or self trained "yolov7qat.onnx" (using 'netron' view, it shows the same structure), the same error occurs when running `. /build/detect `` all show the same error message
Runs fine with non-qat "yolov7db4fp32.engine" or "yolov7db4fp16.engine"

Environment

    CUDA: 11.4.315
    cuDNN: 8.6.0.166
    TensorRT: 5.1
    Python: 3.8.10
    PyTorch: 1.12.0a0+2c916ef.nv22.3
Hardware
    Model: Jetson-AGX
    Module: NVIDIA Jetson AGX Xavier (32 GB ram)
    L4T: 35.2.1
    Jetpack: 5.1

Peplace

# int8 QAT model, the onnx model with Q&DQ nodes /usr/src/tensorrt/bin/trtexec --onnx=yolov7qat.onnx --saveEngine=yolov7QAT.engine --fp16 --int8

with

# int8 QAT model, the onnx model with Q&DQ nodes and dynamic-batch /usr/src/tensorrt/bin/trtexec --onnx=yolov7qat.onnx \ --minShapes=images:1x3x640x640 \ --optShapes=images:12x3x640x640 \ --maxShapes=images:16x3x640x640 \ --saveEngine=yolov7QAT.engine --fp16 --int8

However, when testing performance with /usr/src/tensorrt/bin/trtexec --loadEngine=yourmodel.engine, the performance of the engine that is explicitly specified as dynamic batch is much worse.

yolov7QAT.engine
=== Performance summary === [I] Throughput: 57.8406 qps [I] Latency mean = 17.8946 ms

yolov7QAT.engine with dynamic batch(max=16)
=== Performance summary === [I] Throughput: 23.8396 qps [I] Latency: mean = 42.046 ms

NVIDIA-AI-IOT / yolo_deepstream