Darknet map calculation is extremely slow on RTX3090

mayur-ag commented 3 years ago

During training mAP calculation, and even while using the darknet detector map command, the calculation is painfully slow. It takes several seconds to read a single image. I tried editing my test.txt to have only 2 images, and it's again stuck. Following is the snapshot:

darknet detector map obj.data model.cfg .model.weights -gpus 1
 CUDA-version: 11010 (11020), cuDNN: 8.0.5, CUDNN_HALF=1, GPU count: 2  
 CUDNN_HALF=1 
 OpenCV version: 4.4.0
1
 0 : compute_capability = 860, cudnn_half = 1, GPU: GeForce RTX 3090 
net.optimized_memory = 0 
mini_batch = 1, batch = 1, time_steps = 1, train = 0 
   layer   filters  size/strd(dil)      input                output
   0 conv     32       3 x 3/ 2    416 x 416 x   3 ->  208 x 208 x  32 0.075 BF
   1 conv     64       3 x 3/ 2    208 x 208 x  32 ->  104 x 104 x  64 0.399 BF
   2 conv     64       3 x 3/ 1    104 x 104 x  64 ->  104 x 104 x  64 0.797 BF
   3 route  2                              1/2 ->  104 x 104 x  32 
   4 conv     32       3 x 3/ 1    104 x 104 x  32 ->  104 x 104 x  32 0.199 BF
   5 conv     32       3 x 3/ 1    104 x 104 x  32 ->  104 x 104 x  32 0.199 BF
   6 route  5 4                                ->  104 x 104 x  64 
   7 conv     64       1 x 1/ 1    104 x 104 x  64 ->  104 x 104 x  64 0.089 BF
   8 route  2 7                                ->  104 x 104 x 128 
   9 max                2x 2/ 2    104 x 104 x 128 ->   52 x  52 x 128 0.001 BF
  10 conv    128       3 x 3/ 1     52 x  52 x 128 ->   52 x  52 x 128 0.797 BF
  11 route  10                             1/2 ->   52 x  52 x  64 
  12 conv     64       3 x 3/ 1     52 x  52 x  64 ->   52 x  52 x  64 0.199 BF
  13 conv     64       3 x 3/ 1     52 x  52 x  64 ->   52 x  52 x  64 0.199 BF
  14 route  13 12                              ->   52 x  52 x 128 
  15 conv    128       1 x 1/ 1     52 x  52 x 128 ->   52 x  52 x 128 0.089 BF
  16 route  10 15                              ->   52 x  52 x 256 
  17 max                2x 2/ 2     52 x  52 x 256 ->   26 x  26 x 256 0.001 BF
  18 conv    256       3 x 3/ 1     26 x  26 x 256 ->   26 x  26 x 256 0.797 BF
  19 route  18                             1/2 ->   26 x  26 x 128 
  20 conv    128       3 x 3/ 1     26 x  26 x 128 ->   26 x  26 x 128 0.199 BF
  21 conv    128       3 x 3/ 1     26 x  26 x 128 ->   26 x  26 x 128 0.199 BF
  22 route  21 20                              ->   26 x  26 x 256 
  23 conv    256       1 x 1/ 1     26 x  26 x 256 ->   26 x  26 x 256 0.089 BF
  24 route  18 23                              ->   26 x  26 x 512 
  25 max                2x 2/ 2     26 x  26 x 512 ->   13 x  13 x 512 0.000 BF
  26 conv    512       3 x 3/ 1     13 x  13 x 512 ->   13 x  13 x 512 0.797 BF
  27 conv    256       1 x 1/ 1     13 x  13 x 512 ->   13 x  13 x 256 0.044 BF
  28 conv    512       3 x 3/ 1     13 x  13 x 256 ->   13 x  13 x 512 0.399 BF
  29 conv    117       1 x 1/ 1     13 x  13 x 512 ->   13 x  13 x 117 0.020 BF
  30 yolo
[yolo] params: iou loss: ciou (4), iou_norm: 0.07, obj_norm: 1.00, cls_norm: 1.00, delta_norm: 1.00, scale_x_y: 1.05
nms_kind: greedynms (1), beta = 0.600000 
  31 route  27                                 ->   13 x  13 x 256 
  32 conv    128       1 x 1/ 1     13 x  13 x 256 ->   13 x  13 x 128 0.011 BF
  33 upsample                 2x    13 x  13 x 128 ->   26 x  26 x 128
  34 route  33 23                              ->   26 x  26 x 384 
  35 conv    256       3 x 3/ 1     26 x  26 x 384 ->   26 x  26 x 256 1.196 BF
  36 conv    117       1 x 1/ 1     26 x  26 x 256 ->   26 x  26 x 117 0.040 BF
  37 yolo
[yolo] params: iou loss: ciou (4), iou_norm: 0.07, obj_norm: 1.00, cls_norm: 1.00, delta_norm: 1.00, scale_x_y: 1.05
nms_kind: greedynms (1), beta = 0.600000 
  38 route  35                                 ->   26 x  26 x 256 
  39 conv     64       1 x 1/ 1     26 x  26 x 256 ->   26 x  26 x  64 0.022 BF
  40 upsample                 2x    26 x  26 x  64 ->   52 x  52 x  64
  41 route  40 15                              ->   52 x  52 x 192 
  42 conv    128       3 x 3/ 1     52 x  52 x 192 ->   52 x  52 x 128 1.196 BF
  43 conv    117       1 x 1/ 1     52 x  52 x 128 ->   52 x  52 x 117 0.081 BF
  44 yolo
[yolo] params: iou loss: ciou (4), iou_norm: 0.07, obj_norm: 1.00, cls_norm: 1.00, delta_norm: 1.00, scale_x_y: 1.05
nms_kind: greedynms (1), beta = 0.600000 
Total BFLOPS 8.138 
avg_outputs = 298709 
 Allocate additional workspace_size = 13.80 MB 
Loading weights from model.weights...
 seen 64, trained: 1174 K-images (18 Kilo-batches_64) 
Done! Loaded 45 layers from weights-file 

 calculation mAP (mean average precision)...
 Detection layer: 30 - type = 28 
 Detection layer: 37 - type = 28 
 Detection layer: 44 - type = 28 
2

It is stuck here at this point. But when I use:

darknet detector test obj.data model.cfg model.weights 17.png

It runs very fast:

 0 : compute_capability = 860, cudnn_half = 1, GPU: GeForce RTX 3090 
net.optimized_memory = 0 
mini_batch = 1, batch = 1, time_steps = 1, train = 0 
   layer   filters  size/strd(dil)      input                output
   0 conv     32       3 x 3/ 2    416 x 416 x   3 ->  208 x 208 x  32 0.075 BF
   1 conv     64       3 x 3/ 2    208 x 208 x  32 ->  104 x 104 x  64 0.399 BF
   2 conv     64       3 x 3/ 1    104 x 104 x  64 ->  104 x 104 x  64 0.797 BF
   3 route  2                              1/2 ->  104 x 104 x  32 
   4 conv     32       3 x 3/ 1    104 x 104 x  32 ->  104 x 104 x  32 0.199 BF
   5 conv     32       3 x 3/ 1    104 x 104 x  32 ->  104 x 104 x  32 0.199 BF
   6 route  5 4                                ->  104 x 104 x  64 
   7 conv     64       1 x 1/ 1    104 x 104 x  64 ->  104 x 104 x  64 0.089 BF
   8 route  2 7                                ->  104 x 104 x 128 
   9 max                2x 2/ 2    104 x 104 x 128 ->   52 x  52 x 128 0.001 BF
  10 conv    128       3 x 3/ 1     52 x  52 x 128 ->   52 x  52 x 128 0.797 BF
  11 route  10                             1/2 ->   52 x  52 x  64 
  12 conv     64       3 x 3/ 1     52 x  52 x  64 ->   52 x  52 x  64 0.199 BF
  13 conv     64       3 x 3/ 1     52 x  52 x  64 ->   52 x  52 x  64 0.199 BF
  14 route  13 12                              ->   52 x  52 x 128 
  15 conv    128       1 x 1/ 1     52 x  52 x 128 ->   52 x  52 x 128 0.089 BF
  16 route  10 15                              ->   52 x  52 x 256 
  17 max                2x 2/ 2     52 x  52 x 256 ->   26 x  26 x 256 0.001 BF
  18 conv    256       3 x 3/ 1     26 x  26 x 256 ->   26 x  26 x 256 0.797 BF
  19 route  18                             1/2 ->   26 x  26 x 128 
  20 conv    128       3 x 3/ 1     26 x  26 x 128 ->   26 x  26 x 128 0.199 BF
  21 conv    128       3 x 3/ 1     26 x  26 x 128 ->   26 x  26 x 128 0.199 BF
  22 route  21 20                              ->   26 x  26 x 256 
  23 conv    256       1 x 1/ 1     26 x  26 x 256 ->   26 x  26 x 256 0.089 BF
  24 route  18 23                              ->   26 x  26 x 512 
  25 max                2x 2/ 2     26 x  26 x 512 ->   13 x  13 x 512 0.000 BF
  26 conv    512       3 x 3/ 1     13 x  13 x 512 ->   13 x  13 x 512 0.797 BF
  27 conv    256       1 x 1/ 1     13 x  13 x 512 ->   13 x  13 x 256 0.044 BF
  28 conv    512       3 x 3/ 1     13 x  13 x 256 ->   13 x  13 x 512 0.399 BF
  29 conv    117       1 x 1/ 1     13 x  13 x 512 ->   13 x  13 x 117 0.020 BF
  30 yolo
[yolo] params: iou loss: ciou (4), iou_norm: 0.07, obj_norm: 1.00, cls_norm: 1.00, delta_norm: 1.00, scale_x_y: 1.05
nms_kind: greedynms (1), beta = 0.600000 
  31 route  27                                 ->   13 x  13 x 256 
  32 conv    128       1 x 1/ 1     13 x  13 x 256 ->   13 x  13 x 128 0.011 BF
  33 upsample                 2x    13 x  13 x 128 ->   26 x  26 x 128
  34 route  33 23                              ->   26 x  26 x 384 
  35 conv    256       3 x 3/ 1     26 x  26 x 384 ->   26 x  26 x 256 1.196 BF
  36 conv    117       1 x 1/ 1     26 x  26 x 256 ->   26 x  26 x 117 0.040 BF
  37 yolo
[yolo] params: iou loss: ciou (4), iou_norm: 0.07, obj_norm: 1.00, cls_norm: 1.00, delta_norm: 1.00, scale_x_y: 1.05
nms_kind: greedynms (1), beta = 0.600000 
  38 route  35                                 ->   26 x  26 x 256 
  39 conv     64       1 x 1/ 1     26 x  26 x 256 ->   26 x  26 x  64 0.022 BF
  40 upsample                 2x    26 x  26 x  64 ->   52 x  52 x  64
  41 route  40 15                              ->   52 x  52 x 192 
  42 conv    128       3 x 3/ 1     52 x  52 x 192 ->   52 x  52 x 128 1.196 BF
  43 conv    117       1 x 1/ 1     52 x  52 x 128 ->   52 x  52 x 117 0.081 BF
  44 yolo
[yolo] params: iou loss: ciou (4), iou_norm: 0.07, obj_norm: 1.00, cls_norm: 1.00, delta_norm: 1.00, scale_x_y: 1.05
nms_kind: greedynms (1), beta = 0.600000 
Total BFLOPS 8.138 
avg_outputs = 298709 
 Allocate additional workspace_size = 13.80 MB 
Loading weights from model.weights...
 seen 64, trained: 1174 K-images (18 Kilo-batches_64) 
Done! Loaded 45 layers from weights-file 
Enter Image Path:  Detection layer: 30 - type = 28 
 Detection layer: 37 - type = 28 
 Detection layer: 44 - type = 28
17.png: Predicted in 3.094000 milli-seconds

It means it's stuck while calculating mAP. I've tried disabling OPENMP, AVX, but still the same issue.

AlexeyAB commented 3 years ago

It is stuck here at this point. But when I use:

It looks like you specified wrong cfg or weights file.

mayur-ag commented 3 years ago

@AlexeyAB Sorry, I've edited the post. I have passed the right paths, but edited it here to hide local paths.

mayur-ag commented 3 years ago

The same behaviour is also observed in the darknet detector train command. During the mAP calculation it's extremely slow. Takes like 30 secs per image.

mayur-ag commented 3 years ago

The same behaviour is also observed in the darknet detector train command. During the mAP calculation it's extremely slow. Takes like 30 secs per image.

AlexeyAB commented 3 years ago

Try to change this value to .01 and recompile: https://github.com/AlexeyAB/darknet/blob/103d301ccbc19e47e002005bdfdbaf07a92cd880/src/detector.c#L995
attach your cfg file

mayur-ag commented 3 years ago

Isn't it same as passing -thresh while running the command? I'll try recompile and test. Here's the cfg file, it's just yolov4-tiny

[net]
# Testing
#batch=1
#subdivisions=1
# Training
batch=64
subdivisions=1
width=416
height=416
channels=3
momentum=0.9
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1

learning_rate=0.00261
burn_in=1000
max_batches = 20000
policy=steps
steps=15000,17500
scales=.1,.1

[convolutional]
batch_normalize=1
filters=32
size=3
stride=2
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=64
size=3
stride=2
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=leaky

[route]
layers=-1
groups=2
group_id=1

[convolutional]
batch_normalize=1
filters=32
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=32
size=3
stride=1
pad=1
activation=leaky

[route]
layers = -1,-2

[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=leaky

[route]
layers = -6,-1

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky

[route]
layers=-1
groups=2
group_id=1

[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=leaky

[route]
layers = -1,-2

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky

[route]
layers = -6,-1

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky

[route]
layers=-1
groups=2
group_id=1

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky

[route]
layers = -1,-2

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky

[route]
layers = -6,-1

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

##################################

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[convolutional]
size=1
stride=1
pad=1
filters=117
activation=linear

[yolo]
mask = 6,7,8
anchors = 34, 91,  26,148,  37,148,  55,111,  32,237,  49,175, 267, 38,  45,283,  84,158
classes=34
num=9
jitter=.3
scale_x_y = 1.05
cls_normalizer=1.0
iou_normalizer=0.07
iou_loss=ciou
ignore_thresh = .7
truth_thresh = 1
random=0
resize=1.5
nms_kind=greedynms
beta_nms=0.6

[route]
layers = -4

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky

[upsample]
stride=2

[route]
layers = -1, 23

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky

[convolutional]
size=1
stride=1
pad=1
filters=117
activation=linear

[yolo]
mask = 3,4,5
anchors = 34, 91,  26,148,  37,148,  55,111,  32,237,  49,175, 267, 38,  45,283,  84,158
classes=34
num=9
jitter=.3
scale_x_y = 1.05
cls_normalizer=1.0
iou_normalizer=0.07
iou_loss=ciou
ignore_thresh = .7
truth_thresh = 1
random=0
resize=1.5
nms_kind=greedynms
beta_nms=0.6

[route]
layers = -3

[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=leaky

[upsample]
stride=2

[route]
layers = -1, 15

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky

[convolutional]
size=1
stride=1
pad=1
filters=117
activation=linear

[yolo]
mask = 0,1,2
anchors = 34, 91,  26,148,  37,148,  55,111,  32,237,  49,175, 267, 38,  45,283,  84,158
classes=34
num=9
jitter=.3
scale_x_y = 1.05
cls_normalizer=1.0
iou_normalizer=0.07
iou_loss=ciou
ignore_thresh = .7
truth_thresh = 1
random=0
resize=1.5
nms_kind=greedynms
beta_nms=0.6

mayur-ag commented 3 years ago

I have recompiled by replacing the threshold variable in detector.c, but the issue still persists.

AlexeyAB commented 3 years ago

Try to set this value to .2 and recompile: https://github.com/AlexeyAB/darknet/blob/103d301ccbc19e47e002005bdfdbaf07a92cd880/src/detector.c#L995

If it will not help, then the issue is that the cfg file does not match the weight file, or the weights are poorly trained.

haviduck commented 3 years ago

i had that issue with 3090 when opencv wasnt compiled correctly.

mayur-ag commented 3 years ago

I have OpenCV with CUDA support compiled from source. Is this combination known to have problems? @AlexeyAB @haviduck

mayur-ag commented 3 years ago

Another observation is that: this problem occurs only on CERTAIN types of models. For instance, the one presented above. I trained another model with a different config (YOLOv4x) and that worked flawlessly. Also, I think this issue has nothing to do with good/bad model convergence, as the problematic model freezes even after it has converged.

haviduck commented 3 years ago

hmm, well it certainly can be depending on the type of layers in your cfg. but if i recall correctly, cuda 11 only works with opencv 4.5 for the 3x series. thats what im running anyway. i spent alot of time compiling opencv correctly, then made sure the proper dlls was in place. i checked that by systematically going through each dll with dumpbin /DEPENDENTS file.dll. it gives you a list over required dlls. if some of them are outside your env paths you need to place them in the same dir as your script. you also have to recompile the console dll and nogpu dll found in the!+ /build directory and ensure you replace any line where 10.0 or 10.1 is hardcoded.

but i can only talk from my own setup and experience. opencv_world, opencv_videoio and ffmpeg along with others needs to be in your paths. oh and remember to add the path to opencv contrib bin when you compile opencv.

this guide is pretty awesome https://jamesbowley.co.uk/accelerate-opencv-4-5-0-on-windows-build-with-cuda-and-python-bindings/

kadirbeytorun commented 3 years ago

@mayur-ag @AlexeyAB I have the same issue with yolov4-csp network. Training speed is normal, yet map calculation is extremely slow.

Yolov4-tiny training with the same data has no such issue. So I doubt the problem is with the darknet build.

mayur-ag commented 3 years ago

@kadirbeytorun yeah I faced the same issue, the problem is only with "certain" types of configurations. For instance, dataset 1 - yolov4-tiny has the problem, but dataset-2 yolov4-tiny wouldn't have it. I tried to debug the source code but eventually gave up.

What I settled for is to NOT use the -map option while training. Once the model is trained, I just use darknet map command the check the map. Only downside is, you won't have the training mAP graph.

akashAD98 commented 2 years ago

im doing something different I trained the model with 1 cfg , and i got the weight of that model .so i was doing testing on same data but only weight is different, im getting no results.

I have Quetion, my approach is wrong? Do we need to find MAP on the same data, same weight & cfg??

lrf19991230 commented 1 year ago

I have the same problem, training is normal, but the -map calculation is extremely slow.

AlexeyAB / darknet

Darknet map calculation is extremely slow on RTX3090 #7252