AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.63k stars 7.95k forks source link

Repo Claims To Be YOLOv5 #5920

Closed danielbarry closed 3 years ago

danielbarry commented 4 years ago

Hey there,

This repo is claiming to be YOLOv5: https://github.com/ultralytics/yolov5

They released a blog here: https://blog.roboflow.ai/yolov5-is-here/

It's being discussed on HN here: https://news.ycombinator.com/item?id=23478151

In all honesty this looks like some bullshit company stole the name, but it would be good to get some proper word on this @AlexeyAB

AlexeyAB commented 4 years ago

Read more about YOLOv5:


Comparison YOLOv3 vs YOLOv4 vs YOLOv5: https://github.com/WongKinYiu/CrossStagePartialNetworks/issues/32#issuecomment-638064640

CSPDarknet53s-YOSPP gets 19.5% faster model inference speed and 1.3% higher AP than YOLOv5l.

YOLOv4 achieves 133 - 384 FPS with batch=4 using OpenCV and at least 2x more with batch=32: OpenCV_Vs_TensorRT


123036148-3e43a180-d3f5-11eb-926d-bbc810f0ea6a

Data from:

123036798-4b14c500-d3f6-11eb-97ed-63d99414e410


84604438-abf6ec80-aec8-11ea-8341-f4563ea51dbc

danielbarry commented 4 years ago

@josephofiowa I've updated my comment to reflect you're not the author - sorry. I am just trying to get to the bottom of these dubious claims.

fat-tire commented 4 years ago

I'm still confused cuz i thought YOLOv3 was the final one due to ethical concerns.

danielbarry commented 4 years ago

I'm still confused cuz i thought YOLOv3 was the final one due to ethical concerns.

It's the last project by pjreddie, but not the last word on YOLO or Darknet.

AlexeyAB commented 4 years ago

I'm still confused cuz i thought YOLOv3 was the final one due to ethical concerns.

image


Tables 8-10: https://arxiv.org/pdf/2004.10934.pdf

(Real-time detectors with FPS 30 or higher are highlighted here. We compare the results with batch=1 without using tensorRT.)

comparison_gpus


https://medium.com/@alexeyab84/yolov4-the-most-accurate-real-time-neural-network-on-ms-coco-dataset-73adfd3602fe?source=friends_link&sk=6039748846bbcf1d960c3061542591d7

Therefore, we only show results with batch = 1 and without using TensorRT on comparison graphs.

AlexeyAB commented 4 years ago

@glenn-jocher did a lot for the development and improvements of Yolo and showed a lot of ideas, he created at least 2 very good repositories on Pytorch. Thus, he gave Yolo a long life outside of Darknet. All this hype around the Yolov5 was not raised by him.

AlexeyAB commented 4 years ago

Some notes on comparison: https://github.com/ultralytics/yolov5

AlexeyAB commented 4 years ago

Invalid comparison results in the roboflow.ai blog: https://blog.roboflow.ai/yolov5-is-here/

Actually if both networks YOLOv4s and ultralytics-YOLOv5l are trained and tested on the same framework with the same batch on a commond dataset Microsoft COCO: https://github.com/WongKinYiu/CrossStagePartialNetworks/issues/32#issuecomment-638064640


Second, YOLOv5 is fast – blazingly fast. In a YOLOv5 Colab notebook, running a Tesla P100, we saw inference times up to 0.007 seconds per image, meaning 140 frames per second (FPS)! By contrast, YOLOv4 achieved 50 FPS after having been converted to the same Ultralytics PyTorch library.

  1. Actually YOLOv4 is faster and more accurate than YOLOv5l if it is tested with equal settings batch=16 on the same framework https://github.com/ultralytics/yolov5 on a common dataset Microsoft COCO (while YOLOv5x is much more slower than YOLOv4, and YOLOv5s is much less accurate than YOLOv4)

CSPDarknet53s-PASPP-Mish: 608x608 (~YOLOv4) is +0.8AP more accuracte and 1.3x times (+30%) faster than YOLOv5l 736x736

Full true comparsion: https://github.com/WongKinYiu/CrossStagePartialNetworks/issues/32#issuecomment-638064640

CSPDarknet53s-PASPP-Mish: (~YOLOv4)

cd53s-paspp-mish 45.0% AP @ 608x608
Model Summary: 212 layers, 6.43092e+07 parameters, 6.43092e+07 gradients
Speed: 8.7/1.6/10.3 ms inference/NMS/total per 608x608 image at batch-size 16

YOLOv5l:

yolov5l 44.2% AP @ 736x736
Model Summary: 231 layers, 6.17556e+07 parameters, 6.17556e+07 gradients
Speed: 11.3/2.2/13.5 ms inference/NMS/total per 736x736 image at batch-size 16

  1. They compared size of models of small ultralytics-YOLOv5-version YOLOv5s (27 MB) with very low accuracy 26-36% AP on Microsoft COCO with big YOLOv4 (245 MB) with very high accuracy 41-43% AP on Microsoft COCO

Fourth, YOLOv5 is small. Specifically, a weights file for YOLOv5 is 27 megabytes. Our weights file for YOLOv4 (with Darknet architecture) is 244 megabytes. YOLOv5 is nearly 90 percent smaller than YOLOv4. This means YOLOv5 can be deployed to embedded devices much more easily.


  1. They compared speed of very small and much less accurate version of ultralytics-YOLOv5 with very accurate and big YOLOv4. They did not provide the most critical details for comparison: what exactly YOLOv5 version was used s,l,x,... what training and testing resolutions were used, and what test batch was used for both YOLOv4 vs ultralytics-YOLOv5. They did not test it on the generally accepted Microsoft COCO dataset, with exactly the same settings, and they did not test it on the Microsoft COCO CodaLab-evaluation server, to reduce the likelihood of manipulation.

Third, YOLOv5 is accurate. In our tests on the blood cell count and detection (BCCD) dataset, we achieved roughly 0.895 mean average precision (mAP) after training for just 100 epochs. Admittedly, we saw comparable performance from EfficientDet and YOLOv4, but it is rare to see such across-the-board performance improvements without any loss in accuracy.

danielbarry commented 4 years ago

@AlexeyAB Thank you for breaking that down. I think my suspicion of the comparisons was warranted.

I just noticed that their iOS app page calls their network YOLOv4: https://apps.apple.com/app/id1452689527

YOLOv4 is an updated version of YOLOv3-SPP, trained on the COCO dataset in PyTorch and transferred to an Apple CoreML model via ONNX.

Someone said that they were apparently very surprised when you released YOLOv4 as they were planning to also release YOLOv4. I think this really puts emphasis on the need for people to communicate their intentions.

rcg12387 commented 4 years ago

I just noticed that their iOS app page calls their network YOLOv4: https://apps.apple.com/app/id1452689527

YOLOv4 is an updated version of YOLOv3-SPP, trained on the COCO dataset in PyTorch and transferred to an Apple CoreML model via ONNX.

Someone said that they were apparently very surprised when you released YOLOv4 as they were planning to also release YOLOv4. I think this really puts emphasis on the need for people to communicate their intentions.

Yeah. I see it's from Ultralytics LLC, who now becomes of the creator of YOLOv5. I agree your opinion. IMO Ultralytics has intended to succeed to YOLO by implementing PyTorch version with several contributions. Anyway it is the encouraging news for PyTorch community even it doesn't have a significant superior to YOLOv4 of @AlexeyAB.

danielbarry commented 4 years ago

I think there is a strong case for either project to adjust their name to reflect the works are not built upon one another and are not a fair comparison.

As YOLO started in the Darknet framework, this repository was somewhat endorsed by pjreddie, @AlexeyAB was first to the punch with YOLOv4, Ultralytics already had their own "flavour" of YOLOv3 for TF - it would make sense to rename YOLOv5. Even something small like "uYOLOv5", or "YOuLOv5" could be significant in distinguishing the works.

Otherwise who publishes YOLOv6, and is YOLOv6 the improvement from YOLOv4 or YOLOv5? I think this is incredibly confusing and serves nobody.

josephofiowa commented 4 years ago

It's Joseph, author of that Roboflow blog post announcing Glenn Jocher's YOLOv5 implementation.

Our goal is to make models more accessible for anyone to use on their own datasets. Our evaluation on a sample task (BCCD) is meant to highlight tradeoffs and expose differences if one were to clone each repo and use them with little customization. Our post is not intended to be a replacement nor representative of a formal benchmark on COCO.

Sincere thanks to the community on your feedback and continued evaluation. We have published a comprehensive updated post on Glenn Jocher's decision to name the model YOLOv5 as well as exactly how to reproduce the results we reported.

@AlexeyAB called out very important notes above that we included in this followup post and updated in the original post. Cloning the YOLOv5 repository defaults to YOLOv5s, and the Darknet implementation defaults to "big YOLOv4." In our sample task, both these models appear to max out their mAP at 0.91 mAP. YOLOv5s is 27 MB; big YOLOv5l is 192 MB; big YOLOv4 is 245 MB. For inference speed, Glenn's YOLOv5 implementation defaults to batch inference and divides the batch time by the number of images in the batch, resulting the reported 140 FPS figure. YOLOv4 defaults to a batch size of 1. This is an unfair comparison. In the detailed update, we set both batch sizes to 1, where we see YOLOv4 achieves 30 FPS and YOLOv5 achieves 10 FPS.

Ultimately, we encourage trying each on one's own problem, and consider the tradeoffs based on your domain considerations (like ease of setup, complexity of task, model size, inference speed reqs). We published guides in the post to make that deliberately easy. And we will continue to listen on where the community lands on what exact name is best for Glenn Jocher's YOLOv5 implementation.

AlexeyAB commented 4 years ago

YOLOv3-spp vs YOLOv4(leaky) vs YOLOv5 - with the same batch=32, each point - another test-network-resolution: https://github.com/WongKinYiu/CrossStagePartialNetworks/issues/35#issuecomment-643257711

image

rcg12387 commented 4 years ago

@josephofiowa Thank you for your blog post Responding to the Controversy about YOLOv5: YOLOv4 Versus YOLOv5. But I am a little confused about that you wrote. First, in the last sentence of the section "Comparing YOLOv4 and YOLOv5s Model Storage Size" you wrote like this:

The largest YOLOv5 is YOLOv5l, and its weights are 192 MB.

Then what about YOLOv5x?

Second, in the fourth sentence of the section "Comparing YOLOV4 and YOLOv5s Inference Time" you wrote like this:

On single images (batch size of 1), YOLOv4 inferences in 33 ms (30 FPS) and YOLOv5s inferences in 20ms (10 FPS).

It should be 100ms or 50 FPS for YOLOv5s, I might say.

Thank you for your post.

AlexeyAB commented 4 years ago

@rcg12387 Why do you think they should know arithmetic? )

josephofiowa commented 4 years ago

@rcg12387 Thanks for the model sizes question. We've updated the post to show all sizes:

Updated to include model size of all YOLOv5 models. v5x: 367mb, v5l 192mb, v5m 84mb, v5s 27MB. YOLOv5s is the model compared in this article. YOLOv4-custom refers to the model we have been testing throughout this post. model-sizes

Thanks for your callout of the arithmetic error. It's corrected as is the accompanying graph:

On single images (batch size of 1), YOLOv4 inferences in 33 ms (30 FPS) and YOLOv5s inferences in 20ms (50 FPS). (Update June 14 12:46 PM CDT - In response to rcg12387's GitHub comment, we have corrected an error where we previously calculated YOLOv5 inference to be 10 FPS. We regret this error.)

Note: Glenn Jocher provided inference time updates and pushed an update to his repo so that times are reported as end-to-end latencies. We have included his comments in the post and pasted them below:

The times ... are not for batched inference, they are for batch-size = 1 inference. This is the reason they are printed to the screen one at a time, because they are run in a for loop, with each image passed to the model by itself (tensor size 1x3x416x416). I know this because like many other things, we simply have not had time to modify detect.py properly for batched inference of images from a folder. One disclaimer is that the above times are for inference only, not NMS. NMS will typically add 1-2ms per image to the times. So I would say 8-9ms is the proper batch-size 1 end-to-end latency in your experiment, while 7 ms is the proper batch-size 1 inference-only latency. In response to this I've pushed a commit to improve detect.py time reporting. Times are now reported as full end-to-end latencies: FP32 pytorch inference + posprocessing + NMS. I tested out the new times on a 416x416 test image, and I see 8 ms now at batch-size 1 for full end-to-end latency of YOLOv5s.

@AlexeyAB We have included your COCO benchmark performance in the post as well. Thank you for providing this.

AlexeyAB commented 4 years ago

@josephofiowa But you still don’t know what is the difference between Inference time and FPS )

AlexeyAB commented 4 years ago

The latest comparison: https://github.com/ultralytics/yolov5/issues/6#issuecomment-643823425

84604438-abf6ec80-aec8-11ea-8341-f4563ea51dbc

rcg12387 commented 4 years ago

@josephofiowa Thank you for your reply. I have read your updated post. However, the sentence still remains in the updated post:

The largest YOLOv5 is YOLOv5l, and its weights are 192 MB.

In order to avoid any confusion you should correct this sentence like this: The largest YOLOv5 is YOLOv5x, and its weights are 367 MB. Thanks.

josephofiowa commented 4 years ago

@josephofiowa Thank you for your reply. I have read your updated post. However, the sentence still remains in the updated post:

The largest YOLOv5 is YOLOv5l, and its weights are 192 MB.

In order to avoid any confusion you should correct this sentence like this: The largest YOLOv5 is YOLOv5x, and its weights are 367 MB. Thanks.

Yes, done. Thanks.

josephofiowa commented 4 years ago

@AlexeyAB Thanks. Following performance updates on ultralytics/yolov5#6.

As it is clear Glenn is going to continue to create performance updates (even in the time since the post went live and now) and eventually publish a paper, we will reference that thread in the post for where to find the most up-to-date performance discussion on the COCO benchmark.

pfeatherstone commented 4 years ago

Just to throw a spanner in the works: https://github.com/joe-siyuan-qiao/DetectoRS and https://arxiv.org/pdf/2006.02334.pdf. They claim 73.5 AP50. (I know it has nothing to do with yolo and naming continuity)

AlexeyAB commented 4 years ago

@pfeatherstone DetectoRS is 15x - 60x times slower than Yolo: https://arxiv.org/pdf/2006.02334.pdf

So this is offtopic.

rcg12387 commented 4 years ago

@pfeatherstone Please don't make a hasty conclusion. A merit of YOLO versions is their lightness and speed. Practitioners don't welcome non-realistic latency even though a model has a high precision. It's useless.

pfeatherstone commented 4 years ago

@AlexeyAB I agree it's off topic. But this thread was comparing latency, FPS and accuracy. I thought i might include other non-yolo based models. Maybe that is more suited to a forum.

WongKinYiu commented 4 years ago

@josephofiowa Hello,

Why you change input size of YOLOv5s from default 736x736 to 416x416 in your testing, and compare with YOLOv4 which use input size 608x608.

josephofiowa commented 4 years ago

@WongKinYiu All images were resized to 416x416 in preprocessing in training and testing for both tests. The version of the BCCD dataset used is consistent.

glenn-jocher commented 4 years ago

@AlexeyAB @WongKinYiu @josephofiowa thank you all for your updates. I'm trying to address a few shortcomings simultaneously here. I've started training a few panet-based modifications, so hopefully I'll have those results back in about a week, though I can't guarantee they'll be improved much since this is the first time I've tried this. In the meantime the simplest update I can do is to match test settings to the original efficientdet metrics shown in my readme plot, which are --batch-size 8 and FP16 inference.

As part of this process I've upgraded the entire v5 system from FP32 to FP16 for model storage and inference (test.py and detect.py) when the conditions permit (essentially when a CUDA device is available for inference). This should help produce a better apples-to-apples comparison, and luckily pytorch makes this easy by using the .half() operator.

Since all models are stored in FP16 now, one benefit is that all model sizes have shrunk by half in terms of filesizes. I've also added a second independent hosting system for the weights, so the auto-download functionality should be doubly redundant now, and availability should be enhanced hopefully in China, which seems to not have access to the default Google Drive folder.

The model sizes now span 14MB for s to 183MB for x now, and GPUs with tensor cores, like the T4 and V100 should see inference times (and memory requirements) roughly halved from before. Other GPUs will not see any speed improvement, but will enjoy the same reduced memory requirements. This is the new default, so no special settings are required to see these benefits.

@WongKinYiu and @AlexeyAB can you guys please generate the same curve at batch-size 8 with FP16 inference in order to overlay everything on one graph? Thank you!

glenn-jocher commented 4 years ago

@AlexeyAB I agree it's off topic. But this thread was comparing latency, FPS and accuracy. I thought i might include other non-yolo based models. Maybe that is more suited to a forum.

There are others in the same speed-accuracy neighborhood, like FCOS perhaps. Many people zoom in on the one mAP number at the exclusion of all else unfortunately. From a business perspective, if you offer me one model that is 10% better than another, but costs 10x more (in time or money), the choice is going to be obvious I believe.

WongKinYiu commented 4 years ago

@josephofiowa

It is interesting that 416x416 and 608x608 get same FPS on YOLOv4 in your testing. In your COLab, totally same image, totally same ms, with 608x608 input resolution. image ... image

update: It should be \~50 fps for 416x416 input resolution. image

glenn-jocher commented 4 years ago

@WongKinYiu I think sometimes end-to-end speeds may be dominated by other factors than convolution times, especially for smaller batch sizes.

WongKinYiu commented 4 years ago

@glenn-jocher

I post this result https://github.com/AlexeyAB/darknet/issues/5920#issuecomment-644483769 due to @josephofiowa says the result posted on his blog is from his COLab. However, there is no 416x416 testing in the COLab, there is only 608x608 testing in the COLab, and he says all images were resized to 416x416 in all testing https://github.com/AlexeyAB/darknet/issues/5920#issuecomment-644465808. image image

danielbarry commented 4 years ago

Yeah I just read through, can concur I couldn't find a 416x416 setup, seems 608x608 only.

danielbarry commented 4 years ago

@josephofiowa

YOLOv5:

!python train.py --img 416 --batch 16 --epochs 200 --data '../data.yaml' --cfg ./models/custom_yolov5s.yaml --weights '' --name yolov5s_results --nosave --cache

YOLOv4:

0 conv     32       3 x 3/ 1    608 x 608 x   3 ->  608 x 608 x  32 0.639 BF

So YOLOv5 was trained on a 416x416 input size and YOLOv4 was trained on a 608x608 input size?

WongKinYiu commented 4 years ago

@danielbarry

Yes, from the COLab we can get following information.

glenn-jocher commented 4 years ago

@WongKinYiu looks correct, except v5 default --img-size is the same 640 for everything (train, test, detect).

josephofiowa commented 4 years ago

@WongKinYiu @danielbarry You are correct that the config was not modified from 608x608, yet the inference time was comparable to @WongKinYiu's finding. Perhaps Glenn's comment per small batch size is correct. The config has been updated and Colab is now re-running. (EDIT: This is completed and the post is updated.)

It is also worth noting regarding inference speeds and Glenn's FP16 update: Colab currently does not provide GPU resources that leverage Tensor Cores. It provides a P100, not V100. The mentioned inference speed increase will not be present in Colab.

Please note the Colabs do not intend to be an official benchmark, but rather an "off-the-shelf" performance that one might find cloning these repos. These should not influence the COCO benchmark metrics.

WongKinYiu commented 4 years ago

@glenn-jocher thanks, updated https://github.com/AlexeyAB/darknet/issues/5920#issuecomment-644493225.

glenn-jocher commented 4 years ago

@josephofiowa yes, you are correct, colab P100's will not benefit from the fp16 change, but it also doesn't hurt them. Every once in a while a T4 will show up in colab that does benefit though :)

glenn-jocher commented 4 years ago

Ok I've finished the corrected benchmarks. Models are all exactly the same, but inference is fp16 now, and testing has been tuned a bit to improve speed at the slight expense of some mAP lost, which I thought was a worthwhile compromise.

Most importantly, I believe this is a true apples to apples comparison now, with all models run at --batch 8 and fp16.

I'll probably want to adjust the plot bounds in the future, but plotting with the same exact bounds as my existing plot I get this:

study_mAP_latency EDIT: removed 'latency' from x axis and modified units from ms to ms/img per feedback. EDIT2: perhaps a more proper label would be 'GPU Time (ms/img)'?

WongKinYiu commented 4 years ago

@glenn-jocher

You have to modify the label of x-axis to 1/FPS or GPU_Latency/Batch_Size (ms).

update: Oh, I see your update, (ms/imgs) is also OK.

update: hmm... I think Time is better than Speed, but I am not sure which one is exactly good, maybe just follow efficientdet and use 1/batch8_throughput? image

pfeatherstone commented 4 years ago

what was the batch size ?

WongKinYiu commented 4 years ago

@glenn-jocher

Do you use fast nms mode? I get higher AP but lower FPS than what report in your figure. image

pfeatherstone commented 4 years ago

Looking at that graph, it looks like yolov3-spp is still a serious contender for the belt

pfeatherstone commented 4 years ago

Also were they all trained using same optimisers, schedulers and hyperparameters? @glenn-jocher achieved higher AP with yolov3-spp By retraining with his repo. So it goes well beyond the Model architecture

AlexeyAB commented 4 years ago

@pfeatherstone

Bag of Freebies (BoF) (Mosaic, CIoU, CBN, ...) can be applied to any model regardless of repository - and improve accuracy: https://arxiv.org/pdf/2004.10934.pdf

AlexeyAB commented 4 years ago

@josephofiowa

You're a persistent master of unfair comparisons and forgery of data )

pfeatherstone commented 4 years ago

Just another thought, was yolov3-spp trained using the augmentation tools ? That should maybe be something else to consider when making fair comparisons. It's maybe a bit unfair comparing the performance of different architectures when some have been trained with 'better data'. Maybe COCO is sufficiently large and diverse that augmentation doesn't really help, but just another thought. Maybe, the fairest thing would be to use a single repo like mmdetection and all models are trained using exactly the same data preparation settings and hyper parameters.

pfeatherstone commented 4 years ago

Oh and another observation, yolov5 results are a bit worse if you don't use letterbox resizing. I haven't done a full evaluation on COCO dataset, just an observation based on a few images. So that's an additional thing to take into account as part of 'data preparation' when comparing models. Now maybe, 'data preparation', training hyperparameters, all the bag of freebies as @AlexeyAB puts it, are 'part of' the model, so you don't care how it was trained or how it prepares the data to make a fair comparison, all you need is same input size and same software/hardware environment. BUT, how do you know if a model has reached its full potential when comparing it against other models? How do you know if it has been optimally trained? yolov3-spp is a good example. Do i use the model trained by darknet or ultralytics. The latter has better AP. So do I treat them as different models or take the latter as the official stats. You might argue that to make a fair comparison, all models need to be trained in the exact same way using the exact same hyper-parameters. But one optimizer with one set of hyper-parameters might suit one model very well, but not another. I find this whole model comparison debate very tricky to digest as they are too many variables that can affect a model's performance. All of them can be re-evaluated in a slightly different environment and it is likely that you get very different graphs.

WongKinYiu commented 4 years ago

@pfeatherstone

If your paper proposed a new plugin module or architecture based on a baseline method, better to use totally same other setting for comparison. There are two usually used strategies: 1) following same setting as your baseline, e.g. CSPNet; 2) create new setting and run both of baseline and your method on this setting, e.g. ASFF. image If your paper proposed architectures, loss function, data augmentation, training method... You have to design complete ablation studies.