Effect of BFLOPS - Githubissues

Chitti21 commented 3 years ago

Hi...

Can someone clarify the effect of BFLOPS on the detection speed of the networks?

Yolo v3 has 65.9 BFLOPS Yolo v4 has 128.5 BFLOPS

But still the fps of Yolo v4 is better than Yolo v3.

What makes Yolo v4 better in terms of detection speed?

AlexeyAB commented 3 years ago

The higher the parallelism of devices, the less the impact of bflops on speed.

https://alexeyab84.medium.com/scaled-yolo-v4-is-the-best-neural-network-for-object-detection-on-ms-coco-dataset-39dfa22fa982?source=friends_link&sk=c8553bfed861b1a7932f739d26f487c8

Scaled YOLOv4 utilizes massively parallel devices such as GPUs much more efficiently than EfficientDet. For example, GPU V100 (Volta) has performance: 14 TFLops — 112 TFLops-Tensor-Cores https://images.nvidia.com/content/technologies/volta/pdf/tesla-volta-v100-datasheet-letter-fnl-web.pdf If we test both models on GPU V100 with batch = 1 -hparams = mixed_precision = true And without -tensorrt = FP32 Then:

YOLOv4-CSP (640x640) — 47.5% AP — 70 FPS — 120 BFlops (60 FMA) Based on BFlops, it should be 933 FPS = (112,000 / 120), but in fact we get 70 FPS, i.e. 7.5% GPU used = (70/933)
EfficientDetD3 (896x896) — 47.5% AP — 36 FPS — 50 BFlops (25 FMA) Based on BFlops, it should be 2240 FPS = (112,000 / 50), but in fact we get 36 FPS, i.e. 1.6% GPU used = (36/2240)

Those. efficiency of computing operations on devices with massive parallel computing such as GPUs used in YOLOv4-CSP (7.5 / 1.6) = 4.7x better than the efficiency of operations used in EfficientDetD3. Usually, neural networks are run on the CPU only in research tasks for easier debugging, and the BFlops characteristic is currently only of academic interest. In real-world tasks, real speed and accuracy are important. The real speed of YOLOv4-P6 is 3.7x faster than EfficientDetD7 on GPU V100. Therefore, devices with massive parallelism GPU / NPU / TPU / DSP with much more optimal speed, price and heat dissipation are almost always used

Chitti21 commented 3 years ago

Thanks @AlexeyAB

Kindly state, if the structure of DC-SPP-YOLO shown on page 11 of this paper is similar to YOLO v4 ?

AlexeyAB / darknet

Effect of BFLOPS #7130