12343954 commented 4 years ago

Yolo V4 ， training avg-loss=3.0 , CUDA 10.2 + cuDNN 7.6.5

D:\Darknet-YoloV4\darknet-yolov4\build\darknet\x64>darknet_images.cmd
Flag value false not forcing CPU mode
 Try to load cfg: ./training/11/voc_custom/yolov4_custom.cfg, weights: ./training/11/voc_custom/backup/yolov4-custom_best.weights, clear = 0
 0 : compute_capability = 750, cudnn_half = 0, GPU: GeForce RTX 2060
net.optimized_memory = 0
mini_batch = 1, batch = 64, time_steps = 1, train = 0
   layer   filters  size/strd(dil)      input                output
   0 conv     32       3 x 3/ 1    608 x 608 x   3 ->  608 x 608 x  32 0.639 BF
   1 conv     64       3 x 3/ 2    608 x 608 x  32 ->  304 x 304 x  64 3.407 BF
   2 conv     64       1 x 1/ 1    304 x 304 x  64 ->  304 x 304 x  64 0.757 BF
   3 route  1                                      ->  304 x 304 x  64

Yolo V3 ， training avg-loss=0.2 , CUDA 10.1 + cuDNN 7.6.4

darknet detector test cfg/coco.data cfg/yolov3.cfg yolov3.weights data/dog.jpg
 CUDA-version: 10010 (10010), cuDNN: 7.6.4, CUDNN_HALF=1, GPU count: 1
 OpenCV version: 4.2.0
 compute_capability = 750, cudnn_half = 1
net.optimized_memory = 0
batch = 1, time_steps = 1, train = 0
   layer   filters  size/strd(dil)      input                output
   0 conv     32       3 x 3/ 1    416 x 416 x   3 ->  416 x 416 x  32 0.299 BF
   1 conv     64       3 x 3/ 2    416 x 416 x  32 ->  208 x 208 x  64 1.595 BF

WHERE am i WRONG ??? Why my v4 cudnn_half = 0 ??

12343954 commented 4 years ago

put v3 and v4 under the same CUDA 10.1 + cuDNN 7.6.4 same code detect same image on the same computer

ETA v3:v4 = 39ms : 622ms

v3

D:\Tensorflow2\darknet\build\darknet\x64>detect.cmd
 Try to load cfg: training\11\voc_custom\yolov3_custom.cfg, weights: training\11\voc_custom\backup\yolov3_custom_63000.weights, clear = 0
 compute_capability = 750, cudnn_half = 1
net.optimized_memory = 0
batch = 1, time_steps = 1, train = 0
   layer   filters  size/strd(dil)      input                output
   0 conv     32       3 x 3/ 1    416 x 416 x   3 ->  416 x 416 x  32 0.299 BF
   1 conv     64       3 x 3/ 2    416 x 416 x  32 ->  208 x 208 x  64 1.595 BF
   2 conv     32       1 x 1/ 1    208 x 208 x  64 ->  208 x 208 x  32 0.177 BF
   3 conv     64       3 x 3/ 1    208 x 208 x  32 ->  208 x 208 x  64 1.595 BF

v4

D:\Darknet-YoloV4\darknet-yolov4\build\darknet\x64>detect.cmd
name 'DARKNET_FORCE_CPU' is not defined
 Try to load cfg: training\11\voc_custom\yolov4_custom.cfg, weights: training\11\voc_custom\backup\yolov4-custom_best.weights, clear = 0
 0 : compute_capability = 750, cudnn_half = 0, GPU: GeForce RTX 2060
net.optimized_memory = 0
mini_batch = 1, batch = 64, time_steps = 1, train = 0
   layer   filters  size/strd(dil)      input                output
   0 conv     32       3 x 3/ 1    608 x 608 x   3 ->  608 x 608 x  32 0.639 BF
   1 conv     64       3 x 3/ 2    608 x 608 x  32 ->  304 x 304 x  64 3.407 BF
   2 conv     64       1 x 1/ 1    304 x 304 x  64 ->  304 x 304 x  64 0.757 BF
   3 route  1

AlexeyAB commented 4 years ago

1 Run the same command for both v3 and v4 ./darknet detector test coco.data yolov4.cfg yolov4.weights data/dog.jpg on the same PC

Do you use implementation for rotated-bbox? What implementation do you use?

stephanecharette commented 4 years ago

put v3 and v4 under the same CUDA 10.1 + cuDNN 7.6.4 same code detect same image

Is this on the same computer, same darknet directory, etc?

Cause I can confirm that when I run YOLO v3 or v4 detection on the same computer, I get approximately the same results. Around 3-4 milliseconds for most of my neural networks. And then if I rebuild darknet without GPU support, then I get numbers like 300-700 milliseconds since it runs on the CPU.

I suspect that whatever you are doing, your v4 is running on the CPU.

12343954 commented 4 years ago

@AlexeyAB @stephanecharette Thank you very much for replying.

1, I tested v4 & v3 on the same PC via python. 2, My rotated-bbox code is running on both v4 and v3 darknet framework under GPU. but I don't test with one platform two yolo.cfgs. You can see my two different folders.

I am sure that the above results are all tested under GPU, but the ETAs are very different.
because when I tested on CPU，the v4 ETA is more than 1000ms！！！

I suspect that cudnn is not turned on under v4. what's means of cudnn_half = 0 ?

I used v4 & v3 with their own cfg, wight. I trained their own data set separately, but the same name in different folders. This will ensure that my code changes minimally.

AlexeyAB commented 4 years ago

1 Run the same command for both v3 and v4 ./darknet detector test coco.data yolov4.cfg yolov4.weights data/dog.jpg on the same PC, and show screenshot

Do you use implementation for rotated-bbox? What implementation do you use? Show a link.

12343954 commented 4 years ago

all test under CUDA 10.1 + cuDNN 7.6.4，same PC

v4，darknet.exe detector test data/coco.data cfg/yolov4.cfg yolov4.weights data/dog.jpg

 CUDA-version: 10010 (11000), cuDNN: 7.6.4, CUDNN_HALF=1, GPU count: 1
 CUDNN_HALF=1
 OpenCV version: 4.2.0
 0 : compute_capability = 750, cudnn_half = 1, GPU: GeForce RTX 2060
net.optimized_memory = 0
mini_batch = 1, batch = 8, time_steps = 1, train = 0
   layer   filters  size/strd(dil)      input                output
   0 conv     32       3 x 3/ 1    608 x 608 x   3 ->  608 x 608 x  32 0.639 BF
   1 conv     64       3 x 3/ 2    608 x 608 x  32 ->  304 x 304 x  64 3.407 BF
   2 conv     64       1 x 1/ 1    304 x 304 x  64 ->  304 x 304 x  64 0.757 BF

1111

v4 python darknet_images.py --input=data/dog.jpg --config_file=cfg/yolov4.cfg --weights=yolov4.weights --data_file=data/coco.data

name 'DARKNET_FORCE_CPU' is not defined
 Try to load cfg: cfg/yolov4.cfg, weights: yolov4.weights, clear = 0
 0 : compute_capability = 750, cudnn_half = 0, GPU: GeForce RTX 2060
net.optimized_memory = 0
mini_batch = 1, batch = 8, time_steps = 1, train = 0
   layer   filters  size/strd(dil)      input                output
   0 conv     32       3 x 3/ 1    608 x 608 x   3 ->  608 x 608 x  32 0.639 BF
   1 conv     64       3 x 3/ 2    608 x 608 x  32 ->  304 x 304 x  64 3.407 BF
   2 conv     64       1 x 1/ 1    304 x 304 x  64 ->  304 x 304 x  64 0.757 BF

1111-1

v4 (No GPU) python darknet_images.py --input=data/dog.jpg --config_file=cfg/yolov4.cfg --weights=yolov4.weights --data_file=data/coco.data

Environment variables indicated a CPU run, but we didn't find D:\Darknet-YoloV4\darknet-yolov4\build\darknet\x64\yolo_cpp_dll_nogpu.dll. Trying a GPU run anyway.
 Try to load cfg: cfg/yolov4.cfg, weights: yolov4.weights, clear = 0
mini_batch = 1, batch = 8, time_steps = 1, train = 0
   layer   filters  size/strd(dil)      input                output
   0 conv     32       3 x 3/ 1    608 x 608 x   3 ->  608 x 608 x  32 0.639 BF
   1 conv     64       3 x 3/ 2    608 x 608 x  32 ->  304 x 304 x  64 3.407 BF

1111-2

v3，darknet.exe detector test data/coco.data cfg/yolov3.cfg yolov3.weights data/dog.jpg

CUDA-version: 10010 (11000), cuDNN: 7.6.5, CUDNN_HALF=1, GPU count: 1
 OpenCV version: 4.2.0
 compute_capability = 750, cudnn_half = 1
net.optimized_memory = 0
batch = 1, time_steps = 1, train = 0
   layer   filters  size/strd(dil)      input                output
   0 conv     32       3 x 3/ 1    416 x 416 x   3 ->  416 x 416 x  32 0.299 BF
   1 conv     64       3 x 3/ 2    416 x 416 x  32 ->  208 x 208 x  64 1.595 BF

2222

AlexeyAB commented 4 years ago

@12343954

So from your screenshots on GeForce RTX 2060:

YOLOv4-608 - 39ms
YOLOv3-416 - 20ms (2x faster)

This fully matches the chart from the article: https://arxiv.org/abs/2004.10934 GPU Tesla V100

YOLOv4-608 - 16ms (62 FPS)
YOLOv3-416 - 8ms (120 FPS) (2x faster)

ap_resol

12343954 commented 4 years ago

@AlexeyAB Thank you for reply!

Though my test, I see darknet.exe different performance between v4 and v3. but i don't understand why the same code running under python are very different. 10 times!! is this because v4's cudnn_half is unavailable ? I don't change your code darknet_images.py.

AlexeyAB commented 4 years ago

Does https://github.com/AlexeyAB/darknet/blob/master/darknet_images.py support rotated bboxes?

12343954 commented 4 years ago

😄 sorry, forgot answer your 2nd question. I implement the rotated-bbox via openCV myself, I combined some technical information on the Internet.

robotaiguy commented 3 years ago

Were you, by chance, using the default 416x416 network size for Yolo v3, and now using the default 608x608 network size for Yolo v4? That should be enough to make a notable difference in side-by-side comparisons.

AlexeyAB / darknet

WHY! v4 detection ETA is 10 times than v3 ??!! #6630

Yolo V4 ， training avg-loss=3.0 , CUDA 10.2 + cuDNN 7.6.5

Yolo V3 ， training avg-loss=0.2 , CUDA 10.1 + cuDNN 7.6.4

all test under CUDA 10.1 + cuDNN 7.6.4，same PC

v4，darknet.exe detector test data/coco.data cfg/yolov4.cfg yolov4.weights data/dog.jpg

v4 python darknet_images.py --input=data/dog.jpg --config_file=cfg/yolov4.cfg --weights=yolov4.weights --data_file=data/coco.data

v4 (No GPU) python darknet_images.py --input=data/dog.jpg --config_file=cfg/yolov4.cfg --weights=yolov4.weights --data_file=data/coco.data

v3，darknet.exe detector test data/coco.data cfg/yolov3.cfg yolov3.weights data/dog.jpg