NVIDIA-AI-IOT / yolo_deepstream

yolo model qat and deploy with deepstream&tensorrt
Apache License 2.0
533 stars 135 forks source link

Yolov7 performance bad - 10 FPS only on Orin AGX 64 GB #61

Open Ben93kie opened 1 month ago

Ben93kie commented 1 month ago

I followed the Yolov7 tutorial here.

Exported the onnx from the official pt file. Adjusted the paths in the config files. It successfully built the engine, but I'm getting 10 FPS only (compared to the promised >100).

Here is the output after conversion:

WARNING: [TRT]: If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights. WARNING: [TRT]: Check verbose logs for the list of affected weights. WARNING: [TRT]: - 82 weights are affected by this issue: Detected subnormal FP16 values. WARNING: [TRT]: - 2 weights are affected by this issue: Detected values less than smallest positive FP16 subnormal value and converted them to the FP16 minimum subnormalized value. 0:40:28.444976296 26592 0xaaaad0efc090 INFO nvinfer gstnvinfer.cpp:680:gst_nvinfer_logger: NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::buildModel() [UID = 1]: serialize cuda engine to file: /home/nvidia/Documents/yolo_deepstream/deepstream_yolo/yolov7.onnx_b16_gpu0_fp16.engine successfully INFO: [FullDims Engine Info]: layers num: 2 0 INPUT kFLOAT images 3x640x640 min: 1x3x640x640 opt: 16x3x640x640 Max: 16x3x640x640
1 OUTPUT kFLOAT output 25200x85 min: 0 opt: 0 Max: 0

...

PERF: 9.78 (9.76) 9.62 (9.60) 9.63 (9.61) 9.78 (9.76) 9.64 (9.62) 9.62 (9.60) 9.63 (9.61) 9.63 (9.61) 9.78 (9.76) 9.63 (9.61) 9.78 (9.76) 10.20 (10.17) 9.78 (9.76) 9.63 (9.61) 9.78 (9.76) 9.78 (9.76) PERF: 10.11 (9.89) 10.11 (9.81) 10.11 (9.81) 10.11 (9.89) 10.11 (9.82) 10.11 (9.81) 10.11 (9.81) 10.11 (9.81) 10.11 (9.89) 10.11 (9.81) 10.11 (9.89) 10.11 (10.08) 10.11 (9.89) 10.11 (9.81) 10.11 (9.89) 10.11 (9.89)
**PERF: 10.11 (10.00) 10.11 (9.95

Have I exported the onnx incorrecly or might I have missed sth.?

Ben93kie commented 1 month ago

I had num-sources=16 even though I just wanted 1 source..

Ben93kie commented 1 month ago

Just a question again: I'm getting ~60FPS now on an Orin AGX 64 GB in MAXN mode with the following config:

config_infer_primary_yolov7.txt:
[property]
gpu-id=0
net-scale-factor=0.0039215697906911373
#0=RGB, 1=BGR
model-color-format=0
model-engine-file=yolov7_dy.onnx_b16_gpu0_fp16.engine
#model-engine-file=yolov7_1280.onnx_b1_gpu0_fp16.engine
#model-engine-file=/home/nvidia/Documents/yolo_deepstream/deepstream_yolo/yolov7.onnx_b16_gpu0_fp16.engine
#onnx-file=yolov7_dy.onnx
#onnx-file=yolov7_1280.onnx
labelfile-path=labels.txt
## 0=FP32, 1=INT8, 2=FP16 mode
network-mode=2
num-detected-classes=80
gie-unique-id=1
network-type=0
is-classifier=0
## 1=DBSCAN, 2=NMS, 3= DBSCAN+NMS Hybrid, 4 = None(No clustering)
cluster-mode=2
maintain-aspect-ratio=1
symmetric-padding=1
## Bilinear Interpolation
scaling-filter=1
#parse-bbox-func-name=NvDsInferParseCustomYoloV7
parse-bbox-func-name=NvDsInferParseCustomYoloV7_cuda
#disable-output-host-copy=0
disable-output-host-copy=1
custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so
#scaling-compute-hw=0
## start from DS6.2
crop-objects-to-roi-boundary=1

[class-attrs-all]
#nms-iou-threshold=0.3
#threshold=0.7
nms-iou-threshold=0.65
pre-cluster-threshold=0.25
topk=300

and

[application]
enable-perf-measurement=1
perf-measurement-interval-sec=5
#gie-kitti-output-dir=streamscl

[tiled-display]
enable=0
rows=1
columns=1
width=1280
height=720
gpu-id=0
#(0): nvbuf-mem-default - Default memory allocated, specific to particular platform
#(1): nvbuf-mem-cuda-pinned - Allocate Pinned/Host cuda memory, applicable for Tesla
#(2): nvbuf-mem-cuda-device - Allocate Device cuda memory, applicable for Tesla
#(3): nvbuf-mem-cuda-unified - Allocate Unified cuda memory, applicable for Tesla
#(4): nvbuf-mem-surface-array - Allocate Surface Array memory, applicable for Jetson
nvbuf-memory-type=0

[source0]
enable=1
#Type - 1=CameraV4L2 2=URI 3=MultiURI
type=3
uri=file:/opt/nvidia/deepstream/deepstream/samples/streams/sample_1080p_h264.mp4
num-sources=1
gpu-id=0
# (0): memtype_device   - Memory type Device
# (1): memtype_pinned   - Memory type Host Pinned
# (2): memtype_unified  - Memory type Unified
cudadec-memtype=0

[sink0]
enable=1
#Type - 1=FakeSink 2=EglSink 3=File
type=2
sync=0
source-id=0
gpu-id=0
nvbuf-memory-type=0
#1=mp4 2=mkv
container=1
#1=h264 2=h265
codec=1
output-file=yolov4.mp4

[osd]
enable=1
gpu-id=0
border-width=1
text-size=12
text-color=1;1;1;1;
text-bg-color=0.3;0.3;0.3;1
font=Serif
show-clock=0
clock-x-offset=800
clock-y-offset=820
clock-text-size=12
clock-color=1;0;0;0
nvbuf-memory-type=0

[streammux]
gpu-id=0
##Boolean property to inform muxer that sources are live
live-source=0
batch-size=1
##time out in usec, to wait after the first buffer is available
##to push the batch even if the complete batch is not formed
batched-push-timeout=40000
## Set muxer output width and height
width=1280
height=720
##Enable to maintain aspect ratio wrt source, and allow black borders, works
##along with width, height properties
enable-padding=0
nvbuf-memory-type=0

# config-file property is mandatory for any gie section.
# Other properties are optional and if set will override the properties set in
# the infer config file.
[primary-gie]
enable=1
gpu-id=0
labelfile-path=labels.txt
batch-size=1
#Required by the app for OSD, not a plugin property
bbox-border-color0=1;0;0;1
bbox-border-color1=0;1;1;1
bbox-border-color2=0;0;1;1
bbox-border-color3=0;1;0;1
interval=0
gie-unique-id=1
nvbuf-memory-type=0
#config-file=config_infer_primary_yoloV4.txt
config-file=config_infer_primary_yoloV7.txt

[tracker]
enable=0
# For NvDCF and DeepSORT tracker, tracker-width and tracker-height must be a multiple of 32, respectively
tracker-width=640
tracker-height=384
ll-lib-file=/opt/nvidia/deepstream/deepstream/lib/libnvds_nvmultiobjecttracker.so
# ll-config-file required to set different tracker types
# ll-config-file=/opt/nvidia/deepstream/deepstream/samples/configs/deepstream-app/config_tracker_IOU.yml
ll-config-file=/opt/nvidia/deepstream/deepstream/samples/configs/deepstream-app/config_tracker_NvDCF_perf.yml
# ll-config-file=/opt/nvidia/deepstream/deepstream/samples/configs/deepstream-app/config_tracker_NvDCF_accuracy.yml
# ll-config-file=/opt/nvidia/deepstream/deepstream/samples/configs/deepstream-app/config_tracker_DeepSORT.yml
gpu-id=0
enable-batch-process=1
enable-past-frame=1
display-tracking-id=1

[tests]
file-loop=0

Have I missed a config? Advertised is 120FPS.