Closed danielbarry closed 3 years ago
Read more about YOLOv5:
Comparison YOLOv3 vs YOLOv4 vs YOLOv5: https://github.com/WongKinYiu/CrossStagePartialNetworks/issues/32#issuecomment-638064640
CSPDarknet53s-YOSPP gets 19.5% faster model inference speed and 1.3% higher AP than YOLOv5l.
YOLOv4 achieves 133 - 384 FPS with batch=4 using OpenCV and at least 2x more with batch=32:
Data from:
@josephofiowa I've updated my comment to reflect you're not the author - sorry. I am just trying to get to the bottom of these dubious claims.
I'm still confused cuz i thought YOLOv3 was the final one due to ethical concerns.
I'm still confused cuz i thought YOLOv3 was the final one due to ethical concerns.
It's the last project by pjreddie, but not the last word on YOLO or Darknet.
I'm still confused cuz i thought YOLOv3 was the final one due to ethical concerns.
Tables 8-10: https://arxiv.org/pdf/2004.10934.pdf
(Real-time detectors with FPS 30 or higher are highlighted here. We compare the results with batch=1 without using tensorRT.)
Therefore, we only show results with batch = 1 and without using TensorRT on comparison graphs.
@glenn-jocher did a lot for the development and improvements of Yolo and showed a lot of ideas, he created at least 2 very good repositories on Pytorch. Thus, he gave Yolo a long life outside of Darknet. All this hype around the Yolov5 was not raised by him.
Some notes on comparison: https://github.com/ultralytics/yolov5
The latency shouldn't be measured with batch=32. The latency must be measured with batch=1, because the higher batch - the higher latency. The latency is the time of a complete data processing cycle, it cannot be less than processing a whole batch, which can take up to 1 second depends on batch-size
If there is used batch=32 for both Yolov5 vs EfficientDet (I don't know), then this is ok, but only for Yolov5 vs EfficientDet, and only for FPS (not for latency), it can't be compared with any other results where is batch=1
Size of weights: yolov5x.pt - 366 MB
, yolov5s.pt - 27 MB
Invalid comparison results in the roboflow.ai blog: https://blog.roboflow.ai/yolov5-is-here/
Actually if both networks YOLOv4s and ultralytics-YOLOv5l are trained and tested on the same framework with the same batch on a commond dataset Microsoft COCO: https://github.com/WongKinYiu/CrossStagePartialNetworks/issues/32#issuecomment-638064640
weights size: YOLOv4s 245 MB
vs YOLOv5l 192 MB
vs YOLOv5x 366 MB
test-dev accuracy on MSCOCO: YOLOv4s-608 45% AP
vs YOLOv5l-736 44.2% AP
(YOLOv4 is more accurate)
speed with batch=16: YOLOv4s-608 10.3ms
vs YOLOv5l-736 13.5ms
(YOLOv4 is faster)
roboflow.ai shared the Latency-Accuracy chart withultralytics-YOLOv5 which are measured with batch=32 and then divided by 32, while latency must be measured with batch=1, because the higher batch - the higher latency, latency of 1 sample can't be less than latency of the whole batch, so real latency of YOLOv5 can be up to ~1 second
with high batch-size=32-64
they stated 140 FPS for YOLOv5 (s/m/l/x ???) (what batch-size ???) while YOLOv4 achieves ~400 FPS just with batch=4 by using OpenCV-dnn or TensorRT on GPU RTX 2080ti (table above)
Second, YOLOv5 is fast – blazingly fast. In a YOLOv5 Colab notebook, running a Tesla P100, we saw inference times up to 0.007 seconds per image, meaning 140 frames per second (FPS)! By contrast, YOLOv4 achieved 50 FPS after having been converted to the same Ultralytics PyTorch library.
batch=16
on the same framework https://github.com/ultralytics/yolov5 on a common dataset Microsoft COCO (while YOLOv5x is much more slower than YOLOv4, and YOLOv5s is much less accurate than YOLOv4)CSPDarknet53s-PASPP-Mish: 608x608 (~YOLOv4)
is +0.8AP more accuracte and 1.3x times (+30%) faster than YOLOv5l 736x736
Full true comparsion: https://github.com/WongKinYiu/CrossStagePartialNetworks/issues/32#issuecomment-638064640
CSPDarknet53s-PASPP-Mish: (~YOLOv4)
cd53s-paspp-mish 45.0% AP @ 608x608 Model Summary: 212 layers, 6.43092e+07 parameters, 6.43092e+07 gradients Speed: 8.7/1.6/10.3 ms inference/NMS/total per 608x608 image at batch-size 16
YOLOv5l:
yolov5l 44.2% AP @ 736x736 Model Summary: 231 layers, 6.17556e+07 parameters, 6.17556e+07 gradients Speed: 11.3/2.2/13.5 ms inference/NMS/total per 736x736 image at batch-size 16
26-36%
AP on Microsoft COCO with big YOLOv4 (245 MB) with very high accuracy 41-43%
AP on Microsoft COCOFourth, YOLOv5 is small. Specifically, a weights file for YOLOv5 is 27 megabytes. Our weights file for YOLOv4 (with Darknet architecture) is 244 megabytes. YOLOv5 is nearly 90 percent smaller than YOLOv4. This means YOLOv5 can be deployed to embedded devices much more easily.
Third, YOLOv5 is accurate. In our tests on the blood cell count and detection (BCCD) dataset, we achieved roughly 0.895 mean average precision (mAP) after training for just 100 epochs. Admittedly, we saw comparable performance from EfficientDet and YOLOv4, but it is rare to see such across-the-board performance improvements without any loss in accuracy.
@AlexeyAB Thank you for breaking that down. I think my suspicion of the comparisons was warranted.
I just noticed that their iOS app page calls their network YOLOv4: https://apps.apple.com/app/id1452689527
YOLOv4 is an updated version of YOLOv3-SPP, trained on the COCO dataset in PyTorch and transferred to an Apple CoreML model via ONNX.
Someone said that they were apparently very surprised when you released YOLOv4 as they were planning to also release YOLOv4. I think this really puts emphasis on the need for people to communicate their intentions.
I just noticed that their iOS app page calls their network YOLOv4: https://apps.apple.com/app/id1452689527
YOLOv4 is an updated version of YOLOv3-SPP, trained on the COCO dataset in PyTorch and transferred to an Apple CoreML model via ONNX.
Someone said that they were apparently very surprised when you released YOLOv4 as they were planning to also release YOLOv4. I think this really puts emphasis on the need for people to communicate their intentions.
Yeah. I see it's from Ultralytics LLC, who now becomes of the creator of YOLOv5. I agree your opinion. IMO Ultralytics has intended to succeed to YOLO by implementing PyTorch version with several contributions. Anyway it is the encouraging news for PyTorch community even it doesn't have a significant superior to YOLOv4 of @AlexeyAB.
I think there is a strong case for either project to adjust their name to reflect the works are not built upon one another and are not a fair comparison.
As YOLO started in the Darknet framework, this repository was somewhat endorsed by pjreddie, @AlexeyAB was first to the punch with YOLOv4, Ultralytics already had their own "flavour" of YOLOv3 for TF - it would make sense to rename YOLOv5. Even something small like "uYOLOv5", or "YOuLOv5" could be significant in distinguishing the works.
Otherwise who publishes YOLOv6, and is YOLOv6 the improvement from YOLOv4 or YOLOv5? I think this is incredibly confusing and serves nobody.
It's Joseph, author of that Roboflow blog post announcing Glenn Jocher's YOLOv5 implementation.
Our goal is to make models more accessible for anyone to use on their own datasets. Our evaluation on a sample task (BCCD) is meant to highlight tradeoffs and expose differences if one were to clone each repo and use them with little customization. Our post is not intended to be a replacement nor representative of a formal benchmark on COCO.
Sincere thanks to the community on your feedback and continued evaluation. We have published a comprehensive updated post on Glenn Jocher's decision to name the model YOLOv5 as well as exactly how to reproduce the results we reported.
@AlexeyAB called out very important notes above that we included in this followup post and updated in the original post. Cloning the YOLOv5 repository defaults to YOLOv5s, and the Darknet implementation defaults to "big YOLOv4." In our sample task, both these models appear to max out their mAP at 0.91 mAP. YOLOv5s is 27 MB
; big YOLOv5l is 192 MB
; big YOLOv4 is 245 MB
. For inference speed, Glenn's YOLOv5 implementation defaults to batch inference and divides the batch time by the number of images in the batch, resulting the reported 140 FPS figure. YOLOv4 defaults to a batch size of 1. This is an unfair comparison. In the detailed update, we set both batch sizes to 1, where we see YOLOv4 achieves 30 FPS and YOLOv5 achieves 10 FPS.
Ultimately, we encourage trying each on one's own problem, and consider the tradeoffs based on your domain considerations (like ease of setup, complexity of task, model size, inference speed reqs). We published guides in the post to make that deliberately easy. And we will continue to listen on where the community lands on what exact name is best for Glenn Jocher's YOLOv5 implementation.
YOLOv3-spp vs YOLOv4(leaky) vs YOLOv5 - with the same batch=32, each point - another test-network-resolution: https://github.com/WongKinYiu/CrossStagePartialNetworks/issues/35#issuecomment-643257711
@josephofiowa Thank you for your blog post Responding to the Controversy about YOLOv5: YOLOv4 Versus YOLOv5. But I am a little confused about that you wrote. First, in the last sentence of the section "Comparing YOLOv4 and YOLOv5s Model Storage Size" you wrote like this:
The largest YOLOv5 is YOLOv5l, and its weights are 192 MB.
Then what about YOLOv5x?
Second, in the fourth sentence of the section "Comparing YOLOV4 and YOLOv5s Inference Time" you wrote like this:
On single images (batch size of 1), YOLOv4 inferences in 33 ms (30 FPS) and YOLOv5s inferences in 20ms (10 FPS).
It should be 100ms or 50 FPS for YOLOv5s, I might say.
Thank you for your post.
@rcg12387 Why do you think they should know arithmetic? )
@rcg12387 Thanks for the model sizes question. We've updated the post to show all sizes:
Updated to include model size of all YOLOv5 models. v5x: 367mb, v5l 192mb, v5m 84mb, v5s 27MB. YOLOv5s is the model compared in this article. YOLOv4-custom refers to the model we have been testing throughout this post.
Thanks for your callout of the arithmetic error. It's corrected as is the accompanying graph:
On single images (batch size of 1), YOLOv4 inferences in 33 ms (30 FPS) and YOLOv5s inferences in 20ms (50 FPS). (Update June 14 12:46 PM CDT - In response to rcg12387's GitHub comment, we have corrected an error where we previously calculated YOLOv5 inference to be 10 FPS. We regret this error.)
Note: Glenn Jocher provided inference time updates and pushed an update to his repo so that times are reported as end-to-end latencies. We have included his comments in the post and pasted them below:
The times ... are not for batched inference, they are for batch-size = 1 inference. This is the reason they are printed to the screen one at a time, because they are run in a for loop, with each image passed to the model by itself (tensor size 1x3x416x416). I know this because like many other things, we simply have not had time to modify detect.py properly for batched inference of images from a folder. One disclaimer is that the above times are for inference only, not NMS. NMS will typically add 1-2ms per image to the times. So I would say 8-9ms is the proper batch-size 1 end-to-end latency in your experiment, while 7 ms is the proper batch-size 1 inference-only latency. In response to this I've pushed a commit to improve detect.py time reporting. Times are now reported as full end-to-end latencies: FP32 pytorch inference + posprocessing + NMS. I tested out the new times on a 416x416 test image, and I see 8 ms now at batch-size 1 for full end-to-end latency of YOLOv5s.
@AlexeyAB We have included your COCO benchmark performance in the post as well. Thank you for providing this.
@josephofiowa But you still don’t know what is the difference between Inference time and FPS )
The latest comparison: https://github.com/ultralytics/yolov5/issues/6#issuecomment-643823425
@josephofiowa Thank you for your reply. I have read your updated post. However, the sentence still remains in the updated post:
The largest YOLOv5 is YOLOv5l, and its weights are 192 MB.
In order to avoid any confusion you should correct this sentence like this: The largest YOLOv5 is YOLOv5x, and its weights are 367 MB. Thanks.
@josephofiowa Thank you for your reply. I have read your updated post. However, the sentence still remains in the updated post:
The largest YOLOv5 is YOLOv5l, and its weights are 192 MB.
In order to avoid any confusion you should correct this sentence like this: The largest YOLOv5 is YOLOv5x, and its weights are 367 MB. Thanks.
Yes, done. Thanks.
@AlexeyAB Thanks. Following performance updates on ultralytics/yolov5#6.
As it is clear Glenn is going to continue to create performance updates (even in the time since the post went live and now) and eventually publish a paper, we will reference that thread in the post for where to find the most up-to-date performance discussion on the COCO benchmark.
Just to throw a spanner in the works: https://github.com/joe-siyuan-qiao/DetectoRS and https://arxiv.org/pdf/2006.02334.pdf. They claim 73.5 AP50. (I know it has nothing to do with yolo and naming continuity)
@pfeatherstone DetectoRS is 15x - 60x times slower than Yolo: https://arxiv.org/pdf/2006.02334.pdf
So this is offtopic.
@pfeatherstone Please don't make a hasty conclusion. A merit of YOLO versions is their lightness and speed. Practitioners don't welcome non-realistic latency even though a model has a high precision. It's useless.
@AlexeyAB I agree it's off topic. But this thread was comparing latency, FPS and accuracy. I thought i might include other non-yolo based models. Maybe that is more suited to a forum.
@josephofiowa Hello,
Why you change input size of YOLOv5s from default 736x736 to 416x416 in your testing, and compare with YOLOv4 which use input size 608x608.
@WongKinYiu All images were resized to 416x416 in preprocessing in training and testing for both tests. The version of the BCCD dataset used is consistent.
@AlexeyAB @WongKinYiu @josephofiowa thank you all for your updates. I'm trying to address a few shortcomings simultaneously here. I've started training a few panet-based modifications, so hopefully I'll have those results back in about a week, though I can't guarantee they'll be improved much since this is the first time I've tried this. In the meantime the simplest update I can do is to match test settings to the original efficientdet metrics shown in my readme plot, which are --batch-size 8 and FP16 inference.
As part of this process I've upgraded the entire v5 system from FP32 to FP16 for model storage and inference (test.py and detect.py) when the conditions permit (essentially when a CUDA device is available for inference). This should help produce a better apples-to-apples comparison, and luckily pytorch makes this easy by using the .half()
operator.
Since all models are stored in FP16 now, one benefit is that all model sizes have shrunk by half in terms of filesizes. I've also added a second independent hosting system for the weights, so the auto-download functionality should be doubly redundant now, and availability should be enhanced hopefully in China, which seems to not have access to the default Google Drive folder.
The model sizes now span 14MB for s to 183MB for x now, and GPUs with tensor cores, like the T4 and V100 should see inference times (and memory requirements) roughly halved from before. Other GPUs will not see any speed improvement, but will enjoy the same reduced memory requirements. This is the new default, so no special settings are required to see these benefits.
@WongKinYiu and @AlexeyAB can you guys please generate the same curve at batch-size 8 with FP16 inference in order to overlay everything on one graph? Thank you!
@AlexeyAB I agree it's off topic. But this thread was comparing latency, FPS and accuracy. I thought i might include other non-yolo based models. Maybe that is more suited to a forum.
There are others in the same speed-accuracy neighborhood, like FCOS perhaps. Many people zoom in on the one mAP number at the exclusion of all else unfortunately. From a business perspective, if you offer me one model that is 10% better than another, but costs 10x more (in time or money), the choice is going to be obvious I believe.
@josephofiowa
It is interesting that 416x416 and 608x608 get same FPS on YOLOv4 in your testing. In your COLab, totally same image, totally same ms, with 608x608 input resolution. ...
update: It should be \~50 fps for 416x416 input resolution.
@WongKinYiu I think sometimes end-to-end speeds may be dominated by other factors than convolution times, especially for smaller batch sizes.
@glenn-jocher
I post this result https://github.com/AlexeyAB/darknet/issues/5920#issuecomment-644483769 due to @josephofiowa says the result posted on his blog is from his COLab. However, there is no 416x416 testing in the COLab, there is only 608x608 testing in the COLab, and he says all images were resized to 416x416 in all testing https://github.com/AlexeyAB/darknet/issues/5920#issuecomment-644465808.
Yeah I just read through, can concur I couldn't find a 416x416 setup, seems 608x608 only.
@josephofiowa
YOLOv5:
!python train.py --img 416 --batch 16 --epochs 200 --data '../data.yaml' --cfg ./models/custom_yolov5s.yaml --weights '' --name yolov5s_results --nosave --cache
YOLOv4:
0 conv 32 3 x 3/ 1 608 x 608 x 3 -> 608 x 608 x 32 0.639 BF
So YOLOv5 was trained on a 416x416 input size and YOLOv4 was trained on a 608x608 input size?
@danielbarry
Yes, from the COLab we can get following information.
@WongKinYiu looks correct, except v5 default --img-size
is the same 640 for everything (train, test, detect).
@WongKinYiu @danielbarry You are correct that the config was not modified from 608x608, yet the inference time was comparable to @WongKinYiu's finding. Perhaps Glenn's comment per small batch size is correct. The config has been updated and Colab is now re-running. (EDIT: This is completed and the post is updated.)
It is also worth noting regarding inference speeds and Glenn's FP16 update: Colab currently does not provide GPU resources that leverage Tensor Cores. It provides a P100, not V100. The mentioned inference speed increase will not be present in Colab.
Please note the Colabs do not intend to be an official benchmark, but rather an "off-the-shelf" performance that one might find cloning these repos. These should not influence the COCO benchmark metrics.
@glenn-jocher thanks, updated https://github.com/AlexeyAB/darknet/issues/5920#issuecomment-644493225.
@josephofiowa yes, you are correct, colab P100's will not benefit from the fp16 change, but it also doesn't hurt them. Every once in a while a T4 will show up in colab that does benefit though :)
Ok I've finished the corrected benchmarks. Models are all exactly the same, but inference is fp16 now, and testing has been tuned a bit to improve speed at the slight expense of some mAP lost, which I thought was a worthwhile compromise.
Most importantly, I believe this is a true apples to apples comparison now, with all models run at --batch 8 and fp16.
I'll probably want to adjust the plot bounds in the future, but plotting with the same exact bounds as my existing plot I get this:
EDIT: removed 'latency' from x axis and modified units from ms to ms/img per feedback. EDIT2: perhaps a more proper label would be 'GPU Time (ms/img)'?
@glenn-jocher
You have to modify the label of x-axis to 1/FPS
or GPU_Latency/Batch_Size (ms)
.
update: Oh, I see your update, (ms/imgs)
is also OK.
update: hmm... I think Time is better than Speed, but I am not sure which one is exactly good,
maybe just follow efficientdet and use 1/batch8_throughput
?
what was the batch size ?
@glenn-jocher
Do you use fast nms mode? I get higher AP but lower FPS than what report in your figure.
Looking at that graph, it looks like yolov3-spp is still a serious contender for the belt
Also were they all trained using same optimisers, schedulers and hyperparameters? @glenn-jocher achieved higher AP with yolov3-spp By retraining with his repo. So it goes well beyond the Model architecture
@pfeatherstone
Bag of Freebies (BoF) (Mosaic, CIoU, CBN, ...) can be applied to any model regardless of repository - and improve accuracy: https://arxiv.org/pdf/2004.10934.pdf
@josephofiowa
You're a persistent master of unfair comparisons and forgery of data )
Just another thought, was yolov3-spp trained using the augmentation tools ? That should maybe be something else to consider when making fair comparisons. It's maybe a bit unfair comparing the performance of different architectures when some have been trained with 'better data'. Maybe COCO is sufficiently large and diverse that augmentation doesn't really help, but just another thought. Maybe, the fairest thing would be to use a single repo like mmdetection and all models are trained using exactly the same data preparation settings and hyper parameters.
Oh and another observation, yolov5 results are a bit worse if you don't use letterbox resizing. I haven't done a full evaluation on COCO dataset, just an observation based on a few images. So that's an additional thing to take into account as part of 'data preparation' when comparing models. Now maybe, 'data preparation', training hyperparameters, all the bag of freebies as @AlexeyAB puts it, are 'part of' the model, so you don't care how it was trained or how it prepares the data to make a fair comparison, all you need is same input size and same software/hardware environment. BUT, how do you know if a model has reached its full potential when comparing it against other models? How do you know if it has been optimally trained? yolov3-spp is a good example. Do i use the model trained by darknet or ultralytics. The latter has better AP. So do I treat them as different models or take the latter as the official stats. You might argue that to make a fair comparison, all models need to be trained in the exact same way using the exact same hyper-parameters. But one optimizer with one set of hyper-parameters might suit one model very well, but not another. I find this whole model comparison debate very tricky to digest as they are too many variables that can affect a model's performance. All of them can be re-evaluated in a slightly different environment and it is likely that you get very different graphs.
@pfeatherstone
If your paper proposed a new plugin module or architecture based on a baseline method, better to use totally same other setting for comparison. There are two usually used strategies: 1) following same setting as your baseline, e.g. CSPNet; 2) create new setting and run both of baseline and your method on this setting, e.g. ASFF. If your paper proposed architectures, loss function, data augmentation, training method... You have to design complete ablation studies.
Hey there,
This repo is claiming to be YOLOv5: https://github.com/ultralytics/yolov5
They releaseda blog here: https://blog.roboflow.ai/yolov5-is-here/It's being discussed on HN here: https://news.ycombinator.com/item?id=23478151
In all honesty this looks like some bullshit company stole the name, but it would be good to get some proper word on this @AlexeyAB