WongKinYiu / yolov7

Implementation of paper - YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors
GNU General Public License v3.0
13.37k stars 4.22k forks source link

I'm puzzled the inference speed between yolov7 and yolov5s6 #298

Open zhengzhigang1979 opened 2 years ago

zhengzhigang1979 commented 2 years ago

238

image image

I'm also puzzled. and attached image is the comparison of inference speed between yolov7 and yolov5s6 and yolov7-tiny and yolov5n6。 inference speed of yolov7 is 0.152s and yolov5s6 is 0.011s inference speed of yolov7-tiny is 0.039s and yolov5n6 is 0.007s

please help me to explain the reason of the result. tks

WongKinYiu commented 2 years ago

YOLOv5 will do warmup inference first, than start images inference. YOLOv7 directly start images inference.

zhengzhigang1979 commented 2 years ago

Thank you for your reply opportunely,but I found yolov7 will run inference once warmup before image inference. image

WongKinYiu commented 2 years ago

Oh, you are using detect.py. How you calculate inference time and nms time, as I know detect.py do not show those information, so I assume you are running our demo file. By the way, for GPU has no tensor core, you have to set half to False. https://github.com/WongKinYiu/yolov7/blob/main/detect.py#L31

zhengzhigang1979 commented 2 years ago

Thank you sincerely. I added code to test the Inference and NMS time. image And I set the half to False, and the result is inference costs 0.037s, if set Half to True, inference time is 0.152s. it is still slower than yolov5s6 which inference time is 0.011s.

WongKinYiu commented 2 years ago

Yes, it because reported inference time of yolov5s6 is for 1280 input resolution, and your testing resolution is 640.

zhengzhigang1979 commented 2 years ago

Thank you!!! But I test yolov5s6 with 1280 input resolution and inference time is 0.027s. image And yolov5s6 with 1280 input resolution still faster than yolov7 with 640 input resolution. image

NicholasZollo commented 2 years ago

If you run test.py speed test like in this post: https://github.com/WongKinYiu/yolov7/issues/238#issuecomment-1191533800

Do you get similar results to your detect.py speed test, where yolov7 is slower than yolov5s?

WongKinYiu commented 2 years ago

I am not sure about inference on GPU without tensor core. I have only tested speed on V100 and 2080ti.

AlexeyAB commented 2 years ago

When tested in an identical environment on a nVidia T4 GPU:

YOLOv7 (51.2% AP, 12.7ms) is 1.5x times faster and +6.3% AP more accurate than YOLOv5s6 (44.9% AP, 18.7ms)

https://colab.research.google.com/gist/AlexeyAB/56912451a33981d977ff9ea61025ae40/yolov7trtlinaom.ipynb#scrollTo=-tMYe8f27US9

!python test.py --data data/coco.yaml --img 640 --batch 1 --conf 0.001 --iou 0.65 --device 0 --weights yolov7.pt --name yolov7_640_val
...
Speed: 12.6/0.9/13.5 ms inference/NMS/total per 640x640 image at batch-size 1
...
Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.512
!python val.py --data data/coco.yaml --img 1280 --batch 1 --conf 0.001 --iou 0.65 --device 0 --weights yolov5s6.pt --name yolov5s6_1280_val
...
Speed: 0.7ms pre-process, 18.7ms inference, 1.7ms NMS per image at shape (1, 3, 1280, 1280)
...
Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.449

YOLOv7 (51.2% AP, 12.6ms) has almost the same accuracy but 4x times faster than YOLOv5m6 (51.3% AP, 49.1ms)

https://colab.research.google.com/gist/AlexeyAB/857c4859a7a27abca8775245884d1ecf/yolov7trtlinaom.ipynb

!python test.py --data data/coco.yaml --img 640 --batch 1 --conf 0.001 --iou 0.65 --device 0 --weights yolov7.pt --name yolov7_640_val
...
Speed: 12.6/0.9/13.5 ms inference/NMS/total per 640x640 image at batch-size 1
...
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.512
!python val.py --data data/coco.yaml --img 1280 --batch 1 --conf 0.001 --iou 0.65 --device 0 --weights yolov5m6.pt --name yolov5m6_1280_val
...
Speed: 0.6ms pre-process, 49.1ms inference, 1.7ms NMS per image at shape (1, 3, 1280, 1280)
...
Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.513

More over, YOLOv7-w6 1280x1280 (54.6% AP, 29ms) has comparable accuracy but 6.6x times faster than YOLOv5x6 1280x1280 (55.0% AP, 190ms)

https://colab.research.google.com/gist/AlexeyAB/6f08816fa611def881327de0f5711ae5/yolov7trtlinaom.ipynb

!python test.py --data data/coco.yaml --img 1280 --batch 1 --conf 0.001 --iou 0.65 --device 0 --weights yolov7-w6.pt --name yolov7_w6_1280_val
...
Speed: 28.5/1.4/29.9 ms inference/NMS/total per 1280x1280 image at batch-size 1
...
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.546
!python val.py --data data/coco.yaml --img 1280 --batch 1 --conf 0.001 --iou 0.65 --device 0 --weights yolov5x6.pt --name yolov5x6_1280_val
...
Speed: 0.6ms pre-process, 190.1ms inference, 1.7ms NMS per image at shape (1, 3, 1280, 1280)
...
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.550

YOLOv7-e6 1280x1280 (55.9% AP, 47ms) is +0.9% AP more accurate and 4x times faster than YOLOv5x6 1280x1280 (55.0% AP, 188ms)

https://colab.research.google.com/gist/AlexeyAB/4065112d0d1a252eb433fef38c061f66/yolov7trtlinaom.ipynb#scrollTo=JE3o2PqQy2n4

!python test.py --data data/coco.yaml --img 1280 --batch 1 --conf 0.001 --iou 0.65 --device 0 --weights yolov7-e6.pt --name yolov7_e6_1280_val
...
Speed: 46.6/1.5/48.1 ms inference/NMS/total per 1280x1280 image at batch-size 1
...
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.559
!python val.py --data data/coco.yaml --img 1280 --batch 1 --conf 0.001 --iou 0.65 --device 0 --weights yolov5x6.pt --name yolov5x6_1280_val
...
Speed: 0.7ms pre-process, 188.1ms inference, 1.8ms NMS per image at shape (1, 3, 1280, 1280)
...
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.550