Open NicholasZollo opened 2 years ago
I guess your are running batch 32 inference. For batch 32 inference, YOLOv7 takes 2.8 ms average inference time, and YOLOv5m takes 1.7 ms average inference time in the paper.
I tried running batch size 1 inference, it increased the inference time for both but it did not make yolov7 run faster than yolov5m still. Is my method of speed testing, by running the test.py and val.py for yolov7/yolov5 with the --task speed flag correct?
What are the inference time you get on yolov7-tiny, yolov7, yolov5n, yolov5s, yolov5m, and yolov5l.
I am experiencing the same. I used the below settings:
python test.py --data data/test_yolo.yaml --img 640 --batch 1 --conf 0.001 --iou 0.65 --device cpu --weights yolov7.pt --name yolov7_640_val
and the result:
Speed: 439.6/1.3/440.9 ms inference/NMS/total per 640x640 image at batch-size 1
While for Yolov5, when I am running the below command for the same images, I get the following results".
python \yolov5\detect.py --source inference/images --device cpu
detect: weights=..\FFD\FFD_pipeline\yolov5\yolov5s.pt, source=inference/images, data=..\FFD\FFD_pipeline\yolov5\data\coco128.yaml, imgsz=[640, 640], conf_thres=0.25, iou_thres=0.45, max_det=1000, device=cpu, view_img=False, save_txt=False, save_conf=False, save_crop=False, nosave=False, classes=None, agnostic_nms=False, augment=False, visualize=False, update=False, project=..\FFD\FFD_pipeline\yolov5\runs\detect, name=exp, exist_ok=False, line_thickness=3, hide_labels=False, hide_conf=False, half=False, dnn=False
YOLOv5 2022-7-5 Python-3.8.13 torch-1.11.0+cpu CPU
Fusing layers...
YOLOv5s summary: 213 layers, 7225885 parameters, 0 gradients
image 1/2 D:\Code\yolov7\inference\images\horses.jpg: 448x640 5 horses, Done. (0.107s)
image 2/2 D:\Code\yolov7\inference\images\horses1.jpg: 448x640 5 horses, Done. (0.089s)
Speed: 0.0ms pre-process, 98.0ms inference, 1.5ms NMS per image at shape (1, 3, 640, 640)
Results saved to exp9
Any idea?
CPU inference time is usually proportional to FLOPs.
using the pretrained weights (except yolov7-tiny) on coco 2017 dataset w/ telsa t4 gpu:
yolov5: python val.py --data data/coco.yaml --weights [model] --batch-size 1 --imgsz 640 --task speed --device 0
yolov5n: mAP@.5 0.535 mAP@.5:.95 0.359 0.2ms pre-process, 4.5ms inference, 0.7ms NMS yolov5s: mAP@.5 0.616 mAP@.5:.95 0.439 0.2ms pre-process, 4.7ms inference, 0.7ms NMS yolov5m: mAP@.5 0.672 mAP@.5:.95 0.509 0.2ms pre-process, 6.8ms inference, 0.7ms NMS yolov5l: mAP@.5 0.701 mAP@.5:.95 0.546 0.2ms pre-process, 10.6ms inference, 0.7ms NMS
yolov7: python test.py --data data/coco.yaml --weights [model] --batch-size 1 --img-size 640 --task speed --device 0
yolov7-tiny: mAP@.5 0.349 mAP@.5:.95 0.236 5.0/0.7/5.6 ms inference/NMS/total (Trained for only 36 epochs) yolov7: mAP@.5 0.616 mAP@.5:.95 0.46 11.9/0.7/12.6 ms inference/NMS/total
I did notice that the mAP values displayed are not consistent with the evaluated pycocotools mAP which is the mAP consistent with the claimed mAP values and paper, so that may not be important. however the speed is coming out worse than claimed. There are some variations in the inference time but they are minor.
I can not reproduce your results, but we have tested YOLOv7-tiny on both PyTorch and darknet, they showed consistent results.
Maybe you could run experiment on darknet to check if your pytorch performance on YOLOv7 is normal or not.
darknet.exe detector demo cfg/coco.data cfg/yolov7-tiny.cfg yolov7-tiny.weights test.mp4 -benchmark
Also it really strange of your posted results since T4 GPU is slower than V100, and your T4 inference time is about 30% faster than official u5 V100 inference time. Your T4 performance also more than twice faster than official u5 reported benchmark.
Other people also help us to benchmark on tensorrt, YOLOv7-tiny run about twice faster than YOLOv5s.
I have used my laptop(GPU is GTX1650) to run yolov7 and yolov5-l. At first, it seems yolov7(150ms/image) is slower than yolov5-l(70ms/image).
But I found this issue. When set half=False
, the speed of yolov7 is becoming faster (60~70ms/image) which is colosed to yolov5-l.
In my opinion, some NVIDIA GPUs don't support half inference well. Using 'half' inference may be harmful. It needs to set half=False
for faster inference speed in such devices.
Besides, as for parameters or model size, yolov7 is smaller than yolov5-l. So, yolov7 is more efficient.
I also have the confuse. and attached image is the comparison of inference speed between yolov7 and yolov5s6 and yolov7-tiny and yolov5n6。 inference speed of yolov7 is 0.152s and yolov5s6 is 0.011s inference speed of yolov7-tiny is 0.039s and yolov5n6 is 0.007s
please help me to explain the reason of the result. tks
I also have the confuse. and attached image is the comparison of inference speed between yolov7 and yolov5s6 and yolov7-tiny and yolov5n6。 inference speed of yolov7 is 0.152s and yolov5s6 is 0.011s inference speed of yolov7-tiny is 0.039s and yolov5n6 is 0.007s
please help me to explain the reason of the result. tks I also have the confuse. I do not believe yolov7 faster than yolov5.
When tested in an identical environment on a nVidia T4 GPU:
YOLOv7 (51.2% AP, 12.7ms) is 1.5x
times faster and +6.3%
AP more accurate than YOLOv5s6 (44.9% AP, 18.7ms)
!python test.py --data data/coco.yaml --img 640 --batch 1 --conf 0.001 --iou 0.65 --device 0 --weights yolov7.pt --name yolov7_640_val
...
Speed: 12.6/0.9/13.5 ms inference/NMS/total per 640x640 image at batch-size 1
...
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.512
!python val.py --data data/coco.yaml --img 1280 --batch 1 --conf 0.001 --iou 0.65 --device 0 --weights yolov5s6.pt --name yolov5s6_1280_val
...
Speed: 0.7ms pre-process, 18.7ms inference, 1.7ms NMS per image at shape (1, 3, 1280, 1280)
...
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.449
YOLOv7 (51.2% AP, 12.6ms) has almost the same accuracy but 4x times faster than YOLOv5m6 (51.3% AP, 49.1ms)
!python test.py --data data/coco.yaml --img 640 --batch 1 --conf 0.001 --iou 0.65 --device 0 --weights yolov7.pt --name yolov7_640_val
...
Speed: 12.6/0.9/13.5 ms inference/NMS/total per 640x640 image at batch-size 1
...
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.512
!python val.py --data data/coco.yaml --img 1280 --batch 1 --conf 0.001 --iou 0.65 --device 0 --weights yolov5m6.pt --name yolov5m6_1280_val
...
Speed: 0.6ms pre-process, 49.1ms inference, 1.7ms NMS per image at shape (1, 3, 1280, 1280)
...
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.513
More over, YOLOv7-w6 1280x1280 (54.6% AP, 29ms) has comparable accuracy but 6.6x times faster than YOLOv5x6 1280x1280 (55.0% AP, 192ms)
I tested YOLOv7 on NVIDIA GeForce GTX 1080 Ti and NVIDIA GeForce RTX 3070. On 3070, YOLOv7 inference speed is approximately 50% less than 1080. Consider this in speed tests.
We ran the inference in OpenCV using the ONNX converted models for a single image of size 640x640. All YOLOv7 versions seem to be slower than YOLOv4 and YOLOv5l. Any idea why this is the case?
It is strange that you get 56 FPS (18ms) for yolov7.pt
on Titan RTX (130 TFlops-TC), while there is higher 79 FPS (12.6ms) on GPU T4 (65 TFLops-TC) while Titan RTX is twice more powerful GPU: https://colab.research.google.com/gist/AlexeyAB/857c4859a7a27abca8775245884d1ecf/yolov7trtlinaom.ipynb
YOLOv7 (51.2% AP, 12.6ms) has almost the same accuracy but 4x times faster than YOLOv5m6 (51.3% AP, 49.1ms)
There seems to be something wrong with the ONNX converter or the ONNX inference code.
Have you integrated NMS into YOLOv7-onnx model as shown in our readme file, and did you evaluate YOLOv5 without NMS?
What batch size, float precision, tesnor cores, export code, inference code, number of test images, warmup, nms, ... did you use?
It is strange that you get 56 FPS (18ms) for
yolov7.pt
on Titan RTX (130 TFlops-TC), while there is higher 79 FPS (12.6ms) on GPU T4 (65 TFLops-TC) while Titan RTX is twice more powerful GPU: https://colab.research.google.com/gist/AlexeyAB/857c4859a7a27abca8775245884d1ecf/yolov7trtlinaom.ipynbYOLOv7 (51.2% AP, 12.6ms) has almost the same accuracy but 4x times faster than YOLOv5m6 (51.3% AP, 49.1ms)
There seems to be something wrong with the ONNX converter or the ONNX inference code.
Have you integrated NMS into YOLOv7-onnx model as shown in our readme file, and did you evaluate YOLOv5 without NMS?
What batch size, float precision, tesnor cores, export code, inference code, number of test images, warmup, nms, ... did you use?
@AlexeyAB https://github.com/WongKinYiu/yolov7/issues/400#issue-1325396557, we tried with the OpenCV inference. but got the error mentioned in this issue. Also when inferencing with the ONNX runtime we got low FPS.
batch size = 1,
float precision = 16,
tensor cores = 576,
export code = https://github.com/WongKinYiu/yolov7/blob/main/export.py,
inference code = used OpenCV function readNetFromONNX()
and measure the elapsed time for single inference. Did that for a set of images (~500) and then got the average value
Because of the error mentioned in the issue we omit the --grid
in the exporting command mentioned in the readme.
I compared the speed and mAP of yolov7 and yolov5s6 on coco128 using RTX2060 (it is the same with T4, both have tensor core).
for yolov7:
python test.py --data data/coco128.yaml --img 640 --batch 1 --conf 0.001 --iou 0.65 --device 0 --weights yolov7.pt
for yolov5s6:
python val.py --data data/coco128.yaml --img 1280 --batch 1 --conf 0.001 --iou 0.65 --device 0 --weights yolov5s6.pt
The conclusion is that the mAP of yolov7 is better, and the mAP of yolov7 under the input of 640 can exceed the mAP of yolov5s under the input of 1280, so the paper only pays attention to the comparison of the inference time of 640 and 1280, and
do not care the comparison under the same resolution. This may be one of the reasons why the inference speed of yolov7 is slower than that of yolov5 in the above comparison, because they use the same resolution.
More importantly, yolov7 uses half inference by default, while yolov5 does not use it by default. So in the above experimental results, yolov7 seems to be faster than yolov5s6, but this is just an illusion.
for yolov5s6 half:
python val.py --data data/coco128.yaml --img 1280 --batch 1 --conf 0.001 --iou 0.65 --device 0 --weights yolov5s6.pt --half
So, yolov5 still has excellent speed performance under the input of 1280, but it is undeniable that the mAP of yolov7 under 640 is also excellent enough.
Supplement, yolov7's inference speed under fp32.
Modify the parameter (half_precision) in the test function in the test.py to False and run
python test.py --data data/coco128.yaml --img 640 --batch 1 --conf 0.001 --iou 0.65 --device 0 --weights yolov7.pt
I tested YOLOv7 on NVIDIA GeForce GTX 1080 Ti and NVIDIA GeForce RTX 3070. On 3070, YOLOv7 inference speed is approximately 50% less than 1080. Consider this in speed tests.
Did you ever figure out a fix ?
I tried to do a comparison of inference speed with yolov7 and yolov5m trained on a custom dataset running on Tesla T4 16GB gpu. The paper claims that yolov7 should be significantly faster here, however on my testing the inference time on yolov7 was twice that of yolov5m. It seems that the inference time I'm getting is only proportional to the FLOPS of the model. To do the test I used --task speed flag on test.py in yolov7 and val.py on yolov5. I made sure that they were running on gpu, not the cpu, but this was still the case.