zhengzhigang1979 commented 2 years ago

238

I'm also puzzled. and attached image is the comparison of inference speed between yolov7 and yolov5s6 and yolov7-tiny and yolov5n6。 inference speed of yolov7 is 0.152s and yolov5s6 is 0.011s inference speed of yolov7-tiny is 0.039s and yolov5n6 is 0.007s

please help me to explain the reason of the result. tks

WongKinYiu commented 2 years ago

YOLOv5 will do warmup inference first, than start images inference. YOLOv7 directly start images inference.

zhengzhigang1979 commented 2 years ago

Thank you for your reply opportunely，but I found yolov7 will run inference once warmup before image inference.

WongKinYiu commented 2 years ago

Oh, you are using detect.py. How you calculate inference time and nms time, as I know detect.py do not show those information, so I assume you are running our demo file. By the way, for GPU has no tensor core, you have to set half to False. https://github.com/WongKinYiu/yolov7/blob/main/detect.py#L31

zhengzhigang1979 commented 2 years ago

Thank you sincerely. I added code to test the Inference and NMS time. And I set the half to False, and the result is inference costs 0.037s, if set Half to True, inference time is 0.152s. it is still slower than yolov5s6 which inference time is 0.011s.

WongKinYiu commented 2 years ago

Yes, it because reported inference time of yolov5s6 is for 1280 input resolution, and your testing resolution is 640.

zhengzhigang1979 commented 2 years ago

Thank you!!! But I test yolov5s6 with 1280 input resolution and inference time is 0.027s. And yolov5s6 with 1280 input resolution still faster than yolov7 with 640 input resolution.

NicholasZollo commented 2 years ago

If you run test.py speed test like in this post: https://github.com/WongKinYiu/yolov7/issues/238#issuecomment-1191533800

Do you get similar results to your detect.py speed test, where yolov7 is slower than yolov5s?

WongKinYiu commented 2 years ago

I am not sure about inference on GPU without tensor core. I have only tested speed on V100 and 2080ti.

AlexeyAB commented 2 years ago

When tested in an identical environment on a nVidia T4 GPU:

YOLOv7 (51.2% AP, 12.7ms) is 1.5x times faster and +6.3% AP more accurate than YOLOv5s6 (44.9% AP, 18.7ms)

https://colab.research.google.com/gist/AlexeyAB/56912451a33981d977ff9ea61025ae40/yolov7trtlinaom.ipynb#scrollTo=-tMYe8f27US9

!python test.py --data data/coco.yaml --img 640 --batch 1 --conf 0.001 --iou 0.65 --device 0 --weights yolov7.pt --name yolov7_640_val
...
Speed: 12.6/0.9/13.5 ms inference/NMS/total per 640x640 image at batch-size 1
...
Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.512

!python val.py --data data/coco.yaml --img 1280 --batch 1 --conf 0.001 --iou 0.65 --device 0 --weights yolov5s6.pt --name yolov5s6_1280_val
...
Speed: 0.7ms pre-process, 18.7ms inference, 1.7ms NMS per image at shape (1, 3, 1280, 1280)
...
Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.449

YOLOv7 (51.2% AP, 12.6ms) has almost the same accuracy but 4x times faster than YOLOv5m6 (51.3% AP, 49.1ms)

https://colab.research.google.com/gist/AlexeyAB/857c4859a7a27abca8775245884d1ecf/yolov7trtlinaom.ipynb

!python test.py --data data/coco.yaml --img 640 --batch 1 --conf 0.001 --iou 0.65 --device 0 --weights yolov7.pt --name yolov7_640_val
...
Speed: 12.6/0.9/13.5 ms inference/NMS/total per 640x640 image at batch-size 1
...
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.512

!python val.py --data data/coco.yaml --img 1280 --batch 1 --conf 0.001 --iou 0.65 --device 0 --weights yolov5m6.pt --name yolov5m6_1280_val
...
Speed: 0.6ms pre-process, 49.1ms inference, 1.7ms NMS per image at shape (1, 3, 1280, 1280)
...
Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.513

More over, YOLOv7-w6 1280x1280 (54.6% AP, 29ms) has comparable accuracy but 6.6x times faster than YOLOv5x6 1280x1280 (55.0% AP, 190ms)

https://colab.research.google.com/gist/AlexeyAB/6f08816fa611def881327de0f5711ae5/yolov7trtlinaom.ipynb

!python test.py --data data/coco.yaml --img 1280 --batch 1 --conf 0.001 --iou 0.65 --device 0 --weights yolov7-w6.pt --name yolov7_w6_1280_val
...
Speed: 28.5/1.4/29.9 ms inference/NMS/total per 1280x1280 image at batch-size 1
...
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.546

!python val.py --data data/coco.yaml --img 1280 --batch 1 --conf 0.001 --iou 0.65 --device 0 --weights yolov5x6.pt --name yolov5x6_1280_val
...
Speed: 0.6ms pre-process, 190.1ms inference, 1.7ms NMS per image at shape (1, 3, 1280, 1280)
...
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.550

YOLOv7-e6 1280x1280 (55.9% AP, 47ms) is +0.9% AP more accurate and 4x times faster than YOLOv5x6 1280x1280 (55.0% AP, 188ms)

https://colab.research.google.com/gist/AlexeyAB/4065112d0d1a252eb433fef38c061f66/yolov7trtlinaom.ipynb#scrollTo=JE3o2PqQy2n4

!python test.py --data data/coco.yaml --img 1280 --batch 1 --conf 0.001 --iou 0.65 --device 0 --weights yolov7-e6.pt --name yolov7_e6_1280_val
...
Speed: 46.6/1.5/48.1 ms inference/NMS/total per 1280x1280 image at batch-size 1
...
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.559

!python val.py --data data/coco.yaml --img 1280 --batch 1 --conf 0.001 --iou 0.65 --device 0 --weights yolov5x6.pt --name yolov5x6_1280_val
...
Speed: 0.7ms pre-process, 188.1ms inference, 1.8ms NMS per image at shape (1, 3, 1280, 1280)
...
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.550

WongKinYiu / yolov7

I'm puzzled the inference speed between yolov7 and yolov5s6 #298

238