AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.73k stars 7.96k forks source link

Repo Claims To Be YOLOv5 #5920

Closed danielbarry closed 3 years ago

danielbarry commented 4 years ago

Hey there,

This repo is claiming to be YOLOv5: https://github.com/ultralytics/yolov5

They released a blog here: https://blog.roboflow.ai/yolov5-is-here/

It's being discussed on HN here: https://news.ycombinator.com/item?id=23478151

In all honesty this looks like some bullshit company stole the name, but it would be good to get some proper word on this @AlexeyAB

glenn-jocher commented 4 years ago

@pfeatherstone very good question! It sounds a lot like the nature vs nurture debate in humans, i.e. what proportion of your actions are determined by your genetics and what proportion are dictated by your upbringing, education, and experiences.

WongKinYiu commented 4 years ago

@glenn-jocher

Hello, the controversy of ultralytics/yolov5 is not about this https://github.com/AlexeyAB/darknet/issues/5920#issuecomment-646734609.

@glenn-jocher did a lot for the development and improvements of Yolo and showed a lot of ideas...

This shows ultralytics bring huge contribution into YOLO community https://github.com/AlexeyAB/darknet/issues/5920#issuecomment-642268465, but some other things destroy them in recently update.

  1. in ultralytics/yolov5, it uses time-out in NMS. link

        if (time.time() - t) > time_limit:
            break  # time limit exceeded

    It used for solving the training issue of https://github.com/ultralytics/yolov3/issues/1251. I think it have to use a is_training flag to drive it, or if time limit is reached when inference, users will get unexpected results.

  2. wrong comparison. This figure is not showing GPU latency, it shows average GPU inference time of batch-32 yolov5 models and batch-8 efficientdet models. The GPU latency of yolov5 models are 0.1s\~0.6s when batch size is 32. The issue about this comparison is raised more than 1 week, but it still not be fixed now. image

  3. confused table We can get the information of AP and speed testing from the description of the table, but we can not recognize the information of FLOPs. image

And almost all of other controversy are raised by @josephofiowa 's Blog. Here I only list two of those.

  1. inconsistent predicted results It will gets different predicted results even you use same model, same weights, same input image, and same testing command in each inference.

In @josephofiowa Colab and Blog : image image

Can you imagine that an auto-driving car sometimes can see a pedestrian in front of it, but sometimes not? I think no one will buy this kind of auto-driving car.

2.

Second, YOLOv5 is fast – blazingly fast. In a YOLOv5 Colab notebook, running a Tesla P100, we saw inference times up to 0.007 seconds per image, meaning 140 frames per second (FPS)! By contrast, YOLOv4 achieved 50 FPS after having been converted to the same Ultralytics PyTorch library.

Even at this time, ultralytics/yolov5 not yet support running YOLOv4, how @josephofiowa tested the speed of yolov5s and YOLOv4 using same Ultralytics PyTorch library in 10 days ago?

danielbarry commented 4 years ago

It can reduce about 0.3 ms to 0.8 ms of inference time of each input images, so it can make your chart beautiful. However, it will gets different predicted results even you use same model, same weights, same input image, and same testing command in each inference.

Wow, good spot. This is quite troubling. Would be nice to see this re-tested with a massive timeout value.

We can get the information of AP and speed testing from the description of the table, but we can not recognize the information of FLOPs. Also I guess it is GFLOPs since FLOPs must be an integer.

I think the B after the number is "billion", so the unit is BFLOPS.

Even at this time, ultralytics/yolov5 not yet support running YOLOv4, how @josephofiowa tested the speed of yolov5s and YOLOv4 using same Ultralytics PyTorch library in 10 days ago?

They didn't, it was tested in two different frameworks. I think this was really just meant as a "fair as can be" comparison without the shared framework, but of course this concerns me.

Even the weight file size comparison doesn't really make sense - it could literally just be a case of representation between the two frameworks.

danielbarry commented 4 years ago

What I find particularly confusing is the bar YOLOv4 is held up to when it comes to @josephofiowa 's comparisons with YOLOv5 in their update blog.

So, it's mostly a comparison of YOLOv4 vs YOLOv5s, unless it's object detection accuracy, and magically the comparison is YOLOv5l? It seems like the version of YOLOv5 which bests suites each test is picked. Why not just test all YOLOv5 models - why pick and choose which to compare with?

Potential way forwards: As YOLOv3 is the common model in both frameworks, to me it makes more sense to compare YOLOv4 and YOLOv5 against their respective YOLOv3 versions until a proper framework network port is complete. That way you can mostly shake out framework specific differences.

WongKinYiu commented 4 years ago

@danielbarry

I think the B after the number is "billion", so the unit is BFLOPS.

Thanks, yes it is billion, i correct the description of my comment.

What I find particularly confusing is the bar YOLOv4 is held up to when it comes to @josephofiowa 's comparisons with YOLOv5 in their update blog...

I think it is better to focus on how to make yolo become better, no matter it is ylovx. There are too many mystery in @josephofiowa 's blogs... I have no time to find all of them.

rcg12387 commented 4 years ago

@WongKinYiu It should be good that you correct your comment You wrote:

  1. wrong comparison. This figure is not showing GPU latency, it shows average GPU inference time of batch-32 yolov5 models and batch-8 efficientdet models. The GPU latency of yolov5 models are 0.1s~0.6s when batch size is 32, this is also the reason why @josephofiowa ever got 10 fps results of yolov5s #5920 (comment).

10 fps was an error of @josephofiowa. He updated as 50 fps.

WongKinYiu commented 4 years ago

@rcg12387 Thanks,

edit: I think i can not say "may" about others thinking, I will delete this sentence.

glenn-jocher commented 4 years ago

@WongKinYiu I just today updated the v5 readme table with FP16 speeds for all current models. New models are being trained with panet heads, I'm waiting for the last of these to finish before updating the table again and the chart this weekend (yolov5x takes a bit of time to train).

In any case, the values shown in the chart right now are slower than the actual batch-8 FP16 speeds that I will update to, so the chart should only look better in the future.

The timeout you cite is perfectly normal. It's purpose is to prevent testing times from becoming burdensome during training, for example as in this issue: https://github.com/ultralytics/yolov3/issues/1251

I instituted this code in yolov3 to address this: https://github.com/ultralytics/yolov3/blob/master/utils/utils.py#L489

And it's carried over in v5: https://github.com/ultralytics/yolov5/blob/master/utils/utils.py#L545

The time limit is designed to interrupt execution of NMS operations if they exceed 10.0 full seconds per batch, saving users from suffering from extremely long testing times during training as in the issue above. It does not affect any of the results we are discussing, because all of the models I have run NMS in about 0.001-0.002 seconds per image, and batch sizes used during testing are 32. So at about 0.030-0.06 seconds of elapsed time per batch, the 10.0 second limit will never be approached here.

pfeatherstone commented 4 years ago

Just another thought, might be worth doing comparisons using same inference engine like onnxruntime. For GPU inference that might not make a difference because most repos use cudnn or tensorrt but for CPU inference that makes a huge difference. For example the CPU gemm implementation in darknet isn’t the fastest. In any case, using the same inference engine regardless of target device makes it a little bit more of a fair game. You might have to do NMS as a postprocessing CPU step though but that seems fine to me.

pfeatherstone commented 4 years ago

At the end of the day, all models output a tensor of shape [B,D,F] where B is batch size, D is the total number of candidate detections and F is the number of features equal to 85 for COCO. The features are exactly the same for all models and the post-processing NMS step is the same for everyone. So you can use the exact same ONNXRUNTIME code to infer every model. That seems like a fair play.

pfeatherstone commented 4 years ago

I'm sure there are already quite a few pytorch ports of yolov4 on github so the ONNX port wouldn't be a lot of work.

WongKinYiu commented 4 years ago

@glenn-jocher Hello,

I just today updated the v5 readme table with FP16 speeds for all current models. New models are being trained with panet heads, I'm waiting for the last of these to finish before updating the table again and the chart this weekend (yolov5x takes a bit of time to train).

Thanks for the information, waiting for your new results.

In any case, the values shown in the chart right now are slower than the actual batch-8 FP16 speeds that I will update to, so the chart should only look better in the future.

Yes, I know. I also draw the new figure in https://github.com/AlexeyAB/darknet/issues/5920#issuecomment-644627655.

The time limit is designed to interrupt execution of NMS operations if they exceed 10.0 full seconds per batch, saving users from suffering from extremely long testing times during training as in the issue above. It does not affect any of the results we are discussing, because all of the models I have run NMS in about 0.001-0.002 seconds per image, and batch sizes used during testing are 32. So at about 0.030-0.06 seconds of elapsed time per batch, the 10.0 second limit will never be approached here.

Thanks for the reply, If it used for solving the problem of training, I think it have to use a is_training flag to drive it. I will update the comment to make time_limit and inconsistent predicted results into two problems. Do you have any idea about that why same images and same testing command will generating different predicted results? The serious difference is about 60% (15RBCs, 1WBCs -> 9RBCs, 1WBCs) in josephofiowa's testing.

glenn-jocher commented 4 years ago

Thanks for the reply, If it used for solving the problem of training, I think it have to use a is_training flag to drive it. I will update the comment to make time_limit and inconsistent predicted results into two problems. Do you have any idea about that why same images and same testing command will generating different predicted results? The serious difference is about 60% (15RBCs, 1WBCs -> 9RBCs, 1WBCs) in josephofiowa's testing.

Yes that's an interesting question. Inference is deterministic, I'm not aware of any randomness in the process that should cause different results for an image in a --source directory than calling it directly as --source file. If I run a quick test in colab I see the same results either way, and same speeds too since these are all batch-size 1 operations. It's likely different models may have been used to obtain different results.

Here you can see almost 50 FPS with a K80, Colab's slowest GPU.

Screen Shot 2020-06-20 at 8 52 30 AM
WongKinYiu commented 4 years ago

@glenn-jocher

Thanks, I will move it to controversy raised by josephofiowa temporally. But one thing is for sure, different batch size inference will gets different AP on COCO. It is better to check why it is happens.

And for the small model training, for example yolov5s, I suggest to use lower resolution. ThunderNet shows that small models can not afford high resolution training. Also efficientdet scaling depth, width, and input size, while ultralytics/yolov5 only scaling depth and width. Here is an example of cspnet (trained by ultralytics/yolov3), it gets 26.5AP with 238 FPS on 1080ti using batch size equals to 1, and it is trained/tested with 416x416 resolution. Which is much faster and more accurate than yolov5s trained with 640x640 and tested with 288x288 resolution. image

glenn-jocher commented 4 years ago

@WongKinYiu yes there may be very slight variations in ultralytics mAP when using different batch sizes. This is normal though, and is caused by variations in padding used when constructing letterboxed batches. For example with batch 32 the first 16 images in the batch are padded like this, and the results are as shown here:

!python test.py --weights yolov5s.pt --data ./data/coco.yaml --img 640 --batch 32

Speed: 5.4/3.0/8.5 ms inference/NMS/total per 640x640 image at batch-size 32
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.352
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.544
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.378
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.187
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.397
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.459
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.296
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.496
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.557
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.358
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.618
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.700

test_batch0_pred

But if I use batch 1 then the image is all by itself, so it's padding is not guided by the rest of the images in the batch. In this case it will be padded more minimally. In my test most metrics are almost exactly the same, though it's possible a few may vary minimally between the two scenarios.

!python test.py --weights yolov5s.pt --data ./data/coco.yaml --img 640 --batch 1

Speed: 8.6/2.5/11.1 ms inference/NMS/total per 640x640 image at batch-size 1
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.352
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.544
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.378
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.187
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.397
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.459
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.296
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.496
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.557
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.359
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.619
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.699

test_batch0_pred (1)

glenn-jocher commented 4 years ago

@WongKinYiu also yes, efficientdet does scale image size, while I do all training at 640. Their D0 is trained at 512, with the rest of the sizes at 512 + D * 128. By D7 they are training at 1536 pixels (!).

This is fantastic for rich people with free access to millions of dollars in hardware to train with, but in the real world for the rest of us we get this: https://github.com/google/automl/issues/85#issuecomment-623709815

WongKinYiu commented 4 years ago

@glenn-jocher Thanks,

If results will change when using different batch size, it means the order of input data will effect the results. I think there are two possible solution, 1) always use batch-1 to get results, 2) always padding to full rectangle, e.g. 512x512, 640x640...

This is fantastic for rich people with free access to millions of dollars in hardware to train with, but in the real world for the rest of us we get this...

Yes, it also the reason why I only suggest use lower resolution in small model training. Most of people can not afford large resolution training. And another reason is, big model can learn low resolution well, but small model can not learn high resolution well.

Divkix commented 4 years ago

Is that YOLOv5 real? afaik people have been talking about v4 on YouTube and most forums

CSTEZCAN commented 4 years ago

Hello there,

Well, yes I've seen the -fake- news.

As someone made hundreds of tests on YOLOv4, I can confirm that YOLOv5 is no way related to the Alexey's beautiful work. (my tests; https://www.youtube.com/user/Canonest/videos )

Some people are just trying to ride the hype, created by hardwork of original publishers.

Ignore the YOLOv5 (unless it has been published by Alexey in the future) and focus on YOLOv4!

pfeatherstone commented 4 years ago

Well I wouldn’t say the yolov5 work should be ignored. Particularly yolov5s. That’s where the focus should be in my opinion as it is a good candidate for replacing yolov3-tiny due to inference speed and improved accuracy. If your interests lie in CPU friendly models then yolov5s is one is the best ones out there.

pfeatherstone commented 4 years ago

Maybe @glenn-jocher should have branded yolov5 differently to avoid controversy. At the end of the day, pick the one that suits your needs best, I.e performance requirements and your custom dataset.

AlexeyAB commented 4 years ago

@pfeatherstone There is YOLOv4-tiny released: 40.2% AP50, 371 FPS (GTX 1080 Ti): https://github.com/AlexeyAB/darknet/issues/6067

pfeatherstone commented 4 years ago

Thanks for the update. It feels like there is competition in the YOLO market...

Kreijstal commented 4 years ago

I just found out about the controversy believing that YOLOv5 was an upgraded version of YOLOv4

HardLaugh commented 4 years ago

python pytorch is popular. its a trend to use pytorch to train darknet model. differnet training system always make me confusing, for example efficientdet in tensorflow | pytorch and darknet backward grad in yolo layer

AlexeyAB commented 4 years ago

YOLOv4 training and inference on different frameworks / libraries:

Pytorch-implementations:

TensorFlow: https://github.com/hunglc007/tensorflow-yolov4-tflite

OpenCV (YOLOv4 built-in OpenCV): https://github.com/opencv/opencv

TensorRT: https://github.com/ceccocats/tkDNN

Tencent/NCNN: https://github.com/Tencent/ncnn

TVM https://tvm.ai/about

OpenDataCam: https://github.com/opendatacam/opendatacam#-hardware-pre-requisite

BMW-InnovationLab - Training with YOLOv4 has never been so easy (monitor it in many different ways like TensorBoard or a custom REST API and GUI):

pfeatherstone commented 4 years ago

@AlexeyAB why use darknet to train models rather than pytorch? You’re time must be split between research and maintaining/updating darknet. Not trying to be funny or make a point, just trying to understand the reasoning. Wouldn’t you be more productive if you could just focus on models rather than fixing bugs or creating new layers in darknet ?

pfeatherstone commented 4 years ago

By the way, using darknet is also a great solution as a minimal inference framework on CPU as it can have very minimal dependencies. So I can see reasons from a personal point of view.

pfeatherstone commented 4 years ago

This has arrived https://arxiv.org/pdf/2007.12099v2.pdf. Another flavour of yolo...

AlexeyAB commented 4 years ago

@pfeatherstone https://github.com/AlexeyAB/darknet/issues/6350

pfeatherstone commented 4 years ago

@AlexeyAB thanks. Soz for the duplication

LEEGILJUN commented 3 years ago

So! which model is better between Yolo v4 and Yolo v5 recommendation for the general user??? and Young students fight about which model is good, but can't draw conclusions

kossolax commented 3 years ago

perfs are similar as it's use same backbones. The only difference is one is using darknet, the other pytorch.

Le jeu. 6 mai 2021 à 09:44, LEEGILJUN @.***> a écrit :

So! which model is better between Yolo v4 and Yolo v5 recommendation for the general user??? and Young students fight about which model is good, but can't draw conclusions

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

LEEGILJUN commented 3 years ago

perfs are similar as it's use same backbones. The only difference is one is using darknet, the other pytorch. Le jeu. 6 mai 2021 à 09:44, LEEGILJUN @.***> a écrit : So! which model is better between Yolo v4 and Yolo v5 recommendation for the general user??? and Young students fight about which model is good, but can't draw conclusions — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

it is samely similar also Scaled yolov4?

kossolax commented 3 years ago

The one matching your resolution is the optimal. You can take a bigger one for better accuracy, but it will be slower.

Le ven. 7 mai 2021 à 01:40, LEEGILJUN @.***> a écrit :

perfs are similar as it's use same backbones. The only difference is one is using darknet, the other pytorch. Le jeu. 6 mai 2021 à 09:44, LEEGILJUN @.***> a écrit : … <#m-2655519042882257752> So! which model is better between Yolo v4 and Yolo v5 recommendation for the general user??? and Young students fight about which model is good, but can't draw conclusions — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

it is samely similar also Scaled yolov4?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/AlexeyAB/darknet/issues/5920#issuecomment-833948571, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACIAAI2MPM2YANNXECZ6MLDTMMSIHANCNFSM4N2VLGCQ .

dewball345 commented 3 years ago

Frankly, the only reason why I used yolov5 over yolov4 is that its documentation(I'm talking about the PYTORCH IMPLEMENTATION; Darknet doesn't really fit my needs) is way easier to understand. The detailed steps for exporting to Onnx, TensorFlow, etc. are all there as well as CLEAR directions on how to use your own data. I tried using the PyTorch implementation of yolov4 and I struggled to actually start training, due to OpenCV and some random value errors that popped up. The directions are super vague and don't give much detail on which directory to place your data, where to keep labels, images, etc. Yolov5 at least has a nice tutorial blog teaching how to use your own data as well as a template colab(from roboflow) that actually works. I was able to export to Onnx and TensorFlow in like 5-10 minutes, while I couldn't even train the PyTorch implementation, because it was so hard figuring out where to put what.

So I think that's really the only inherent advantage that the v5 has that makes it worth using over v4. I totally agree that the naming for it is really misleading, but I just wanted to suggest that more documentation be added to this repo and the PyTorch one.

dewball345 commented 3 years ago

Frankly, the only reason why I used yolov5 over yolov4 is that its documentation(I'm talking about the PYTORCH IMPLEMENTATION; Darknet doesn't really fit my needs) is way easier to understand. The detailed steps for exporting to Onnx, TensorFlow, etc. are all there as well as CLEAR directions on how to use your own data. I tried using the PyTorch implementation of yolov4 and I struggled to actually start training, due to OpenCV and some random value errors that popped up. The directions are super vague and don't give much detail on which directory to place your data, where to keep labels, images, etc. Yolov5 at least has a nice tutorial blog teaching how to use your own data as well as a template colab(from roboflow) that actually works. I was able to export to Onnx and TensorFlow in like 5-10 minutes, while I couldn't even train the PyTorch implementation, because it was so hard figuring out where to put what.

So I think that's really the only inherent advantage that the v5 has that makes it worth using over v4. I totally agree that the naming for it is really misleading, but I just wanted to suggest that more documentation be added to this repo and the PyTorch one.

Also would like to mention that while the performance for yolov4 is so much better than yolov5, I think the ease(or the actual ability) to develop is a reasonable trade off.

I think my comment may be a bit unrelated, but just wanted to mention the difference from a developer(who just wants something to work decently)'s point of view

danielbarry commented 3 years ago

I'm going to close out this issue - it's mostly been 'resolved' in terms of understanding exactly what happened and there is not much benefit to continue piling in on the issue.

Feel free to open a new ticket if new issues arise.

GiorgioSgl commented 3 years ago

I test YOLOv4-tiny and YOLOv5s on raspberry PI4 I got from the first 0.4FPS and in the second 1.5FPS, I will come out with a medium soon. :fire:

AlexeyAB commented 3 years ago

@GiorgioSgl Use OpenCV-dnn or OpenVINO or NCNN to test yolov4-tiny.cfg on Raspberry Pi4: https://docs.opencv.org/master/da/d9d/tutorial_dnn_yolo.html

YOLOv4-tiny on Raspberry Pi4:

https://www.reddit.com/r/MachineLearning/comments/hu7lyt/p_yolov4tiny_speed_1770_fps_tensorrtbatch4/ tj8m2znojvb51

AlexeyAB commented 3 years ago

@GiorgioSgl Read more about YOLOv5:

Comparison of YOLOv5 vs Scaled-YOLOv4 / YOLOR: https://github.com/AlexeyAB/darknet/issues/7717 E2hCrxSXoAARYRQ E2l83oPX0AYmkQO

GiorgioSgl commented 3 years ago

I have tried on TFLite framework and I get better results on FPS, but the performance of the mAP@.5 decrease to a ~27. Next week I will give a try on your suggested framework.

Utsabab commented 1 year ago

Detection on thermal Infrared Images. Several versions of YOLOv4 and YOLOv5 compared. Details in Table 2.

https://www.sciencedirect.com/science/article/pii/S1569843222001145