AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.58k stars 7.95k forks source link

Effect of BFLOPS #7130

Closed Chitti21 closed 3 years ago

Chitti21 commented 3 years ago

Hi...

Can someone clarify the effect of BFLOPS on the detection speed of the networks?

Yolo v3 has 65.9 BFLOPS Yolo v4 has 128.5 BFLOPS

But still the fps of Yolo v4 is better than Yolo v3.

What makes Yolo v4 better in terms of detection speed?

AlexeyAB commented 3 years ago

The higher the parallelism of devices, the less the impact of bflops on speed.

https://alexeyab84.medium.com/scaled-yolo-v4-is-the-best-neural-network-for-object-detection-on-ms-coco-dataset-39dfa22fa982?source=friends_link&sk=c8553bfed861b1a7932f739d26f487c8

Scaled YOLOv4 utilizes massively parallel devices such as GPUs much more efficiently than EfficientDet. For example, GPU V100 (Volta) has performance: 14 TFLops — 112 TFLops-Tensor-Cores https://images.nvidia.com/content/technologies/volta/pdf/tesla-volta-v100-datasheet-letter-fnl-web.pdf If we test both models on GPU V100 with batch = 1 -hparams = mixed_precision = true And without -tensorrt = FP32 Then:

Those. efficiency of computing operations on devices with massive parallel computing such as GPUs used in YOLOv4-CSP (7.5 / 1.6) = 4.7x better than the efficiency of operations used in EfficientDetD3. Usually, neural networks are run on the CPU only in research tasks for easier debugging, and the BFlops characteristic is currently only of academic interest. In real-world tasks, real speed and accuracy are important. The real speed of YOLOv4-P6 is 3.7x faster than EfficientDetD7 on GPU V100. Therefore, devices with massive parallelism GPU / NPU / TPU / DSP with much more optimal speed, price and heat dissipation are almost always used

Chitti21 commented 3 years ago

Thanks @AlexeyAB

Kindly state, if the structure of DC-SPP-YOLO shown on page 11 of this paper is similar to YOLO v4 ?