Open 12343954 opened 4 years ago
put v3 and v4 under the same CUDA 10.1 + cuDNN 7.6.4
same code detect same image on the same computer
ETA v3:v4 = 39ms : 622ms
v3
D:\Tensorflow2\darknet\build\darknet\x64>detect.cmd
Try to load cfg: training\11\voc_custom\yolov3_custom.cfg, weights: training\11\voc_custom\backup\yolov3_custom_63000.weights, clear = 0
compute_capability = 750, cudnn_half = 1
net.optimized_memory = 0
batch = 1, time_steps = 1, train = 0
layer filters size/strd(dil) input output
0 conv 32 3 x 3/ 1 416 x 416 x 3 -> 416 x 416 x 32 0.299 BF
1 conv 64 3 x 3/ 2 416 x 416 x 32 -> 208 x 208 x 64 1.595 BF
2 conv 32 1 x 1/ 1 208 x 208 x 64 -> 208 x 208 x 32 0.177 BF
3 conv 64 3 x 3/ 1 208 x 208 x 32 -> 208 x 208 x 64 1.595 BF
v4
D:\Darknet-YoloV4\darknet-yolov4\build\darknet\x64>detect.cmd
name 'DARKNET_FORCE_CPU' is not defined
Try to load cfg: training\11\voc_custom\yolov4_custom.cfg, weights: training\11\voc_custom\backup\yolov4-custom_best.weights, clear = 0
0 : compute_capability = 750, cudnn_half = 0, GPU: GeForce RTX 2060
net.optimized_memory = 0
mini_batch = 1, batch = 64, time_steps = 1, train = 0
layer filters size/strd(dil) input output
0 conv 32 3 x 3/ 1 608 x 608 x 3 -> 608 x 608 x 32 0.639 BF
1 conv 64 3 x 3/ 2 608 x 608 x 32 -> 304 x 304 x 64 3.407 BF
2 conv 64 1 x 1/ 1 304 x 304 x 64 -> 304 x 304 x 64 0.757 BF
3 route 1
1 Run the same command for both v3 and v4 ./darknet detector test coco.data yolov4.cfg yolov4.weights data/dog.jpg
on the same PC
put v3 and v4 under the same
CUDA 10.1 + cuDNN 7.6.4
same code detect same image
Is this on the same computer, same darknet directory, etc?
Cause I can confirm that when I run YOLO v3 or v4 detection on the same computer, I get approximately the same results. Around 3-4 milliseconds for most of my neural networks. And then if I rebuild darknet without GPU support, then I get numbers like 300-700 milliseconds since it runs on the CPU.
I suspect that whatever you are doing, your v4 is running on the CPU.
@AlexeyAB @stephanecharette Thank you very much for replying.
1, I tested v4 & v3 on the same PC via python. 2, My rotated-bbox code is running on both v4 and v3 darknet framework under GPU. but I don't test with one platform two yolo.cfgs. You can see my two different folders.
I suspect that cudnn is not turned on under v4. what's means of cudnn_half = 0
?
I used v4 & v3 with their own cfg, wight. I trained their own data set separately, but the same name in different folders. This will ensure that my code changes minimally.
1 Run the same command for both v3 and v4 ./darknet detector test coco.data yolov4.cfg yolov4.weights data/dog.jpg
on the same PC, and show screenshot
CUDA-version: 10010 (11000), cuDNN: 7.6.4, CUDNN_HALF=1, GPU count: 1
CUDNN_HALF=1
OpenCV version: 4.2.0
0 : compute_capability = 750, cudnn_half = 1, GPU: GeForce RTX 2060
net.optimized_memory = 0
mini_batch = 1, batch = 8, time_steps = 1, train = 0
layer filters size/strd(dil) input output
0 conv 32 3 x 3/ 1 608 x 608 x 3 -> 608 x 608 x 32 0.639 BF
1 conv 64 3 x 3/ 2 608 x 608 x 32 -> 304 x 304 x 64 3.407 BF
2 conv 64 1 x 1/ 1 304 x 304 x 64 -> 304 x 304 x 64 0.757 BF
name 'DARKNET_FORCE_CPU' is not defined
Try to load cfg: cfg/yolov4.cfg, weights: yolov4.weights, clear = 0
0 : compute_capability = 750, cudnn_half = 0, GPU: GeForce RTX 2060
net.optimized_memory = 0
mini_batch = 1, batch = 8, time_steps = 1, train = 0
layer filters size/strd(dil) input output
0 conv 32 3 x 3/ 1 608 x 608 x 3 -> 608 x 608 x 32 0.639 BF
1 conv 64 3 x 3/ 2 608 x 608 x 32 -> 304 x 304 x 64 3.407 BF
2 conv 64 1 x 1/ 1 304 x 304 x 64 -> 304 x 304 x 64 0.757 BF
Environment variables indicated a CPU run, but we didn't find D:\Darknet-YoloV4\darknet-yolov4\build\darknet\x64\yolo_cpp_dll_nogpu.dll. Trying a GPU run anyway.
Try to load cfg: cfg/yolov4.cfg, weights: yolov4.weights, clear = 0
mini_batch = 1, batch = 8, time_steps = 1, train = 0
layer filters size/strd(dil) input output
0 conv 32 3 x 3/ 1 608 x 608 x 3 -> 608 x 608 x 32 0.639 BF
1 conv 64 3 x 3/ 2 608 x 608 x 32 -> 304 x 304 x 64 3.407 BF
CUDA-version: 10010 (11000), cuDNN: 7.6.5, CUDNN_HALF=1, GPU count: 1
OpenCV version: 4.2.0
compute_capability = 750, cudnn_half = 1
net.optimized_memory = 0
batch = 1, time_steps = 1, train = 0
layer filters size/strd(dil) input output
0 conv 32 3 x 3/ 1 416 x 416 x 3 -> 416 x 416 x 32 0.299 BF
1 conv 64 3 x 3/ 2 416 x 416 x 32 -> 208 x 208 x 64 1.595 BF
@12343954
So from your screenshots on GeForce RTX 2060:
This fully matches the chart from the article: https://arxiv.org/abs/2004.10934 GPU Tesla V100
@AlexeyAB Thank you for reply!
Though my test, I see darknet.exe
different performance between v4 and v3.
but i don't understand why the same code running under python are very different. 10 times!!
is this because v4's cudnn_half is unavailable ?
I don't change your code darknet_images.py
.
Does https://github.com/AlexeyAB/darknet/blob/master/darknet_images.py support rotated bboxes?
😄 sorry, forgot answer your 2nd question. I implement the rotated-bbox via openCV myself, I combined some technical information on the Internet.
Were you, by chance, using the default 416x416 network size for Yolo v3, and now using the default 608x608 network size for Yolo v4? That should be enough to make a notable difference in side-by-side comparisons.
Yolo V4 , training avg-loss=3.0 , CUDA 10.2 + cuDNN 7.6.5
Yolo V3 , training avg-loss=0.2 , CUDA 10.1 + cuDNN 7.6.4
WHERE am i WRONG ??? Why my v4
cudnn_half = 0
??