AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.63k stars 7.95k forks source link

Training and testing issue by using Yolov4-csp-x-swish and Yolov4-P5 #8094

Open WilburZjh opened 2 years ago

WilburZjh commented 2 years ago

Hi @AlexeyAB @WongKinYiu ,

When I was using yolov4-p5.cfg with yolov4-p5.conv.232 and yolov4-csp-x-swish.cfg with yolov4-csp-x-swish.conv.192 to train my network. The training is stopped after 4000 iterations, and test is stopped as well.

So I tried to test my own dataset based on the current trained weights. I use the following command:

./darknet detector map data/yolov4-csp-x-swish.data cfg/yolov4-csp-x-swish.cfg yolov4_csp_x_swish_backup/yolov4-csp-x-swish_1000.weights iou_thresh 0.25

Following information is shown from the test: [yolo] params: iou loss: ciou (4), iou_norm: 0.05, obj_norm: 0.40, cls_norm: 0.50, delta_norm: 1.00, scale_x_y: 2.00 Unused field: 'new_coords = 1' 196 route 178 -> 50 x 50 x 320 197 conv 640 3 x 3/ 1 50 x 50 x 320 -> 50 x 50 x 640 9.216 BF 198 conv 18 1 x 1/ 1 50 x 50 x 640 -> 50 x 50 x 18 0.058 BF 199 yolo [yolo] params: iou loss: ciou (4), iou_norm: 0.05, obj_norm: 0.40, cls_norm: 0.50, delta_norm: 1.00, scale_x_y: 2.00 Unused field: 'new_coords = 1' 200 route 191 -> 25 x 25 x 640 201 conv 1280 3 x 3/ 1 25 x 25 x 640 -> 25 x 25 x1280 9.216 BF 202 conv 18 1 x 1/ 1 25 x 25 x1280 -> 25 x 25 x 18 0.029 BF 203 yolo [yolo] params: iou loss: ciou (4), iou_norm: 0.05, obj_norm: 0.40, cls_norm: 0.50, delta_norm: 1.00, scale_x_y: 2.00 Unused field: 'new_coords = 1' Total BFLOPS 344.198 avg_outputs = 1556629 Allocate additional workspace_size = 230.40 MB Loading weights from yolov4_csp_x_swish_backup/yolov4-csp-x-swish_1000.weights...net.optimized_memory = 0 mini_batch = 1, batch = 8, time_steps = 1, train = 0 nms_kind: diounms (2), beta = 0.600000 nms_kind: diounms (2), beta = 0.600000 nms_kind: diounms (2), beta = 0.600000 Done! Loaded 204 layers from weights-file ^M4^M8^M12^M16^M20^M24^M28^M32^M36^M40^M44^M48^M52^M56^M60^M64^M68^M72^M76^M80^M84^M88^M92^M96^M100^M104^M108^M112^M116^M120^M124^M128^M132^M136^M140^M144^M148^M152^M156^M160^M164^M168^M172^M176^M180^M184^M188^M192^M196^M200^M204^M208^M212^M216^M220^M224^M228^M232^M236^M240^M244^M248^M252^M256^M260^M264^M268^M272^M276^M280^M284^M288^M292^M296^M300^M304^M308^M312^M316^M320^M324^M328^M332^M336^M340^M344^M348^M352^M356^M360^M364^M368^M372^M376^M380^M384^M388^M392^M396^M400^M404^M408^M412^M416^M420^M424^M428^M432^M436^M440^M444^M448^M452^M456^M460^M464^M468^M472^M476^M480^M484

I have no idea why my program is stopped...Can you help me on this?

AlexeyAB commented 2 years ago

@WilburZjh Hi,

./darknet detector map data/yolov4-csp-x-swish.data cfg/yolov4-csp-x-swish.cfg yolov4_csp_x_swish_backup/yolov4-csp-x-swish_1000.weights iou_thresh 0.25

WilburZjh commented 2 years ago

@WilburZjh Hi,

./darknet detector map data/yolov4-csp-x-swish.data cfg/yolov4-csp-x-swish.cfg yolov4_csp_x_swish_backup/yolov4-csp-x-swish_1000.weights iou_thresh 0.25

* You should use flag `-iou_thresh 0.25` instead of `iou_thresh 0.25`

* Show screenshot of this error

* Attach cfg-file

Hi @AlexeyAB ,

Thanks for the reply! I just modify the '-iou_thresh' error. But the inference is still slow... My image size is 800x800. Only one class. 9000 images for training (5500 images contain objs, 3500 images dont contain objs), 1000 images for testing(200 images contain objs, 800 dont contain objs).

Here is the cfg-file for training. [net]

Testing

batch=1

subdivisions=1

Training

batch=16 subdivisions=8 width=800 height=800 channels=3 momentum=0.949 decay=0.0005 angle=0 saturation = 1.5 exposure = 1.5 hue=.1

learning_rate=0.001 burn_in=1000 max_batches = 9000 policy=steps steps=7200,8100 scales=.1,.1

mosaic=1

letter_box=1

ema_alpha=0.9998

optimized_memory=1

============ Backbone ============

Stem

0

[convolutional] batch_normalize=1 filters=32 size=3 stride=1 pad=1 activation=swish

P1

Downsample

[convolutional] batch_normalize=1 filters=80 size=3 stride=2 pad=1 activation=swish

Residual Block

[convolutional] batch_normalize=1 filters=40 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 filters=80 size=3 stride=1 pad=1 activation=swish

4 (previous+1+3k)

[shortcut] from=-3 activation=linear

P2

Downsample

[convolutional] batch_normalize=1 filters=160 size=3 stride=2 pad=1 activation=swish

Split

[convolutional] batch_normalize=1 filters=80 size=1 stride=1 pad=1 activation=swish

[route] layers = -2

[convolutional] batch_normalize=1 filters=80 size=1 stride=1 pad=1 activation=swish

Residual Block

[convolutional] batch_normalize=1 filters=80 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 filters=80 size=3 stride=1 pad=1 activation=swish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=80 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 filters=80 size=3 stride=1 pad=1 activation=swish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=80 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 filters=80 size=3 stride=1 pad=1 activation=swish

[shortcut] from=-3 activation=linear

Transition first

[convolutional] batch_normalize=1 filters=80 size=1 stride=1 pad=1 activation=swish

Merge [-1, -(3k+4)]

[route] layers = -1,-13

Transition last

20 (previous+7+3k)

[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=swish

P3

Downsample

[convolutional] batch_normalize=1 filters=320 size=3 stride=2 pad=1 activation=swish

Split

[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=swish

[route] layers = -2

[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=swish

Residual Block

[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 filters=160 size=3 stride=1 pad=1 activation=swish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 filters=160 size=3 stride=1 pad=1 activation=swish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 filters=160 size=3 stride=1 pad=1 activation=swish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 filters=160 size=3 stride=1 pad=1 activation=swish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 filters=160 size=3 stride=1 pad=1 activation=swish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 filters=160 size=3 stride=1 pad=1 activation=swish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 filters=160 size=3 stride=1 pad=1 activation=swish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 filters=160 size=3 stride=1 pad=1 activation=swish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 filters=160 size=3 stride=1 pad=1 activation=swish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 filters=160 size=3 stride=1 pad=1 activation=swish

[shortcut] from=-3 activation=linear

Transition first

[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=swish

Merge [-1 -(4+3k)]

[route] layers = -1,-34

Transition last

57 (previous+7+3k)

[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=swish

P4

Downsample

[convolutional] batch_normalize=1 filters=640 size=3 stride=2 pad=1 activation=swish

Split

[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=swish

[route] layers = -2

[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=swish

Residual Block

[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 filters=320 size=3 stride=1 pad=1 activation=swish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 filters=320 size=3 stride=1 pad=1 activation=swish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 filters=320 size=3 stride=1 pad=1 activation=swish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 filters=320 size=3 stride=1 pad=1 activation=swish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 filters=320 size=3 stride=1 pad=1 activation=swish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 filters=320 size=3 stride=1 pad=1 activation=swish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 filters=320 size=3 stride=1 pad=1 activation=swish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 filters=320 size=3 stride=1 pad=1 activation=swish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 filters=320 size=3 stride=1 pad=1 activation=swish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 filters=320 size=3 stride=1 pad=1 activation=swish

[shortcut] from=-3 activation=linear

Transition first

[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=swish

Merge [-1 -(3k+4)]

[route] layers = -1,-34

Transition last

94 (previous+7+3k)

[convolutional] batch_normalize=1 filters=640 size=1 stride=1 pad=1 activation=swish

P5

Downsample

[convolutional] batch_normalize=1 filters=1280 size=3 stride=2 pad=1 activation=swish

Split

[convolutional] batch_normalize=1 filters=640 size=1 stride=1 pad=1 activation=swish

[route] layers = -2

[convolutional] batch_normalize=1 filters=640 size=1 stride=1 pad=1 activation=swish

Residual Block

[convolutional] batch_normalize=1 filters=640 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 filters=640 size=3 stride=1 pad=1 activation=swish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=640 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 filters=640 size=3 stride=1 pad=1 activation=swish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=640 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 filters=640 size=3 stride=1 pad=1 activation=swish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=640 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 filters=640 size=3 stride=1 pad=1 activation=swish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=640 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 filters=640 size=3 stride=1 pad=1 activation=swish

[shortcut] from=-3 activation=linear

Transition first

[convolutional] batch_normalize=1 filters=640 size=1 stride=1 pad=1 activation=swish

Merge [-1 -(3k+4)]

[route] layers = -1,-19

Transition last

116 (previous+7+3k)

[convolutional] batch_normalize=1 filters=1280 size=1 stride=1 pad=1 activation=swish

============ End of Backbone ============

============ Neck ============

CSPSPP

[convolutional] batch_normalize=1 filters=640 size=1 stride=1 pad=1 activation=swish

[route] layers = -2

[convolutional] batch_normalize=1 filters=640 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=640 activation=swish

[convolutional] batch_normalize=1 filters=640 size=1 stride=1 pad=1 activation=swish

SPP

[maxpool] stride=1 size=5

[route] layers=-2

[maxpool] stride=1 size=9

[route] layers=-4

[maxpool] stride=1 size=13

[route] layers=-1,-3,-5,-6

End SPP

[convolutional] batch_normalize=1 filters=640 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=640 activation=swish

[convolutional] batch_normalize=1 filters=640 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=640 activation=swish

[route] layers = -1, -15

133 (previous+6+5+2k)

[convolutional] batch_normalize=1 filters=640 size=1 stride=1 pad=1 activation=swish

End of CSPSPP

FPN-4

[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=swish

[upsample] stride=2

[route] layers = 94

[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=swish

[route] layers = -1, -3

[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=swish

Split

[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=swish

[route] layers = -2

Plain Block

[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=320 activation=swish

[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=320 activation=swish

[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=320 activation=swish

Merge [-1, -(2k+2)]

[route] layers = -1, -8

Transition last

149 (previous+6+4+2k)

[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=swish

FPN-3

[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=swish

[upsample] stride=2

[route] layers = 57

[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=swish

[route] layers = -1, -3

[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=swish

Split

[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=swish

[route] layers = -2

Plain Block

[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=160 activation=swish

[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=160 activation=swish

[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=160 activation=swish

Merge [-1, -(2k+2)]

[route] layers = -1, -8

Transition last

165 (previous+6+4+2k)

[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=swish

PAN-4

[convolutional] batch_normalize=1 size=3 stride=2 pad=1 filters=320 activation=swish

[route] layers = -1, 149

[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=swish

Split

[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=swish

[route] layers = -2

Plain Block

[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=320 activation=swish

[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=320 activation=swish

[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=320 activation=swish

[route] layers = -1,-8

Transition last

178 (previous+3+4+2k)

[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=swish

PAN-5

[convolutional] batch_normalize=1 size=3 stride=2 pad=1 filters=640 activation=swish

[route] layers = -1, 133

[convolutional] batch_normalize=1 filters=640 size=1 stride=1 pad=1 activation=swish

Split

[convolutional] batch_normalize=1 filters=640 size=1 stride=1 pad=1 activation=swish

[route] layers = -2

Plain Block

[convolutional] batch_normalize=1 filters=640 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=640 activation=swish

[convolutional] batch_normalize=1 filters=640 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=640 activation=swish

[convolutional] batch_normalize=1 filters=640 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=640 activation=swish

[route] layers = -1,-8

Transition last

191 (previous+3+4+2k)

[convolutional] batch_normalize=1 filters=640 size=1 stride=1 pad=1 activation=swish stopbackward=900

============ End of Neck ============

============ Head ============

YOLO-3

[route] layers = 165

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=320 activation=swish

[convolutional] size=1 stride=1 pad=1 filters=18 activation=logistic

[yolo] mask = 0,1,2 anchors = 34, 32, 39, 42, 53, 45, 70, 69, 100, 87, 98,137, 149,108, 180,172, 287,291 classes=1 num=9 jitter=.1 scale_x_y = 2.0 objectness_smooth=1 ignore_thresh = .7 truth_thresh = 1

random=1

resize=1.5

iou_thresh=0.2

iou_normalizer=0.05 cls_normalizer=0.5 obj_normalizer=0.4 iou_loss=ciou nms_kind=diounms beta_nms=0.6 new_coords=1 max_delta=2

YOLO-4

[route] layers = 178

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=640 activation=swish

[convolutional] size=1 stride=1 pad=1 filters=18 activation=logistic

[yolo] mask = 3,4,5 anchors = 34, 32, 39, 42, 53, 45, 70, 69, 100, 87, 98,137, 149,108, 180,172, 287,291 classes=1 num=9 jitter=.1 scale_x_y = 2.0 objectness_smooth=1 ignore_thresh = .7 truth_thresh = 1

random=1

resize=1.5

iou_thresh=0.2

iou_normalizer=0.05 cls_normalizer=0.5 obj_normalizer=0.4 iou_loss=ciou nms_kind=diounms beta_nms=0.6 new_coords=1 max_delta=2

YOLO-5

[route] layers = 191

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=1280 activation=swish

[convolutional] size=1 stride=1 pad=1 filters=18 activation=logistic

[yolo] mask = 6,7,8 anchors = 34, 32, 39, 42, 53, 45, 70, 69, 100, 87, 98,137, 149,108, 180,172, 287,291 classes=1 num=9 jitter=.1 scale_x_y = 2.0 objectness_smooth=1 ignore_thresh = .7 truth_thresh = 1

random=1

resize=1.5

iou_thresh=0.2

iou_normalizer=0.05 cls_normalizer=0.5 obj_normalizer=0.4 iou_loss=ciou nms_kind=diounms beta_nms=0.6 new_coords=1 max_delta=2

The program just not proceed anymore... I paste the last output from terminal... shown as follows v3 (iou loss, Normalizer: (iou: 0.05, cls: 0.40) Region 203 Avg (IOU: 0.000000, GIOU: 0.000000), Class: 0.000000, Obj: 0.000000, No Obj: 0.500000, .5R: 0.000000, .75R: 0.000000, count: 1, class_loss = 30.000395, iou_loss = 45.000587, total_loss = 75.000984 total_bbox = 79805, rewritten_bbox = 0.000000 % v3 (iou loss, Normalizer: (iou: 0.05, cls: 0.40) Region 195 Avg (IOU: 0.735526, GIOU: 0.734017), Class: 0.725161, Obj: 0.500000, No Obj: 0.500000, .5R: 1.000000, .75R: 0.000000, count: 1, class_loss = 479.992310, iou_loss = 721.737183, total_loss = 1201.729492 v3 (iou loss, Normalizer: (iou: 0.05, cls: 0.40) Region 199 Avg (IOU: 0.000000, GIOU: 0.000000), Class: 0.000000, Obj: 0.000000, No Obj: 0.500000, .5R: 0.000000, .75R: 0.000000, count: 1, class_loss = 119.999092, iou_loss = 179.998627, total_loss = 299.997711 v3 (iou loss, Normalizer: (iou: 0.05, cls: 0.40) Region 203 Avg (IOU: 0.000000, GIOU: 0.000000), Class: 0.000000, Obj: 0.000000, No Obj: 0.500000, .5R: 0.000000, .75R: 0.000000, count: 1, class_loss = 30.000395, iou_loss = 45.000587, total_loss = 75.000984 total_bbox = 79806, rewritten_bbox = 0.000000 % v3 (iou loss, Normalizer: (iou: 0.05, cls: 0.40) Region 195 Avg (IOU: 0.818031, GIOU: 0.809160), Class: 0.731000, Obj: 0.500000, No Obj: 0.500000, .5R: 1.000000, .75R: 1.000000, count: 1, class_loss = 479.988220, iou_loss = 721.337585, total_loss = 1201.325806 v3 (iou loss, Normalizer: (iou: 0.05, cls: 0.40) Region 199 Avg (IOU: 0.000000, GIOU: 0.000000), Class: 0.000000, Obj: 0.000000, No Obj: 0.500000, .5R: 0.000000, .75R: 0.000000, count: 1, class_loss = 119.999092, iou_loss = 179.998627, total_loss = 299.997711 v3 (iou loss, Normalizer: (iou: 0.05, cls: 0.40) Region 203 Avg (IOU: 0.000000, GIOU: 0.000000), Class: 0.000000, Obj: 0.000000, No Obj: 0.500000, .5R: 0.000000, .75R: 0.000000, count: 1, class_loss = 30.000395, iou_loss = 45.000587, total_loss = 75.000984 total_bbox = 79807, rewritten_bbox = 0.000000 % v3 (iou loss, Normalizer: (iou: 0.05, cls: 0.40) Region 195 Avg (IOU: 0.000000, GIOU: 0.000000), Class: 0.000000, Obj: 0.000000, No Obj: 0.500000, .5R: 0.000000, .75R: 0.000000, count: 1, class_loss = 479.989929, iou_loss = 719.984924, total_loss = 1199.974854 v3 (iou loss, Normalizer: (iou: 0.05, cls: 0.40) Region 199 Avg (IOU: 0.784754, GIOU: 0.782724), Class: 0.730990, Obj: 0.500000, No Obj: 0.500000, .5R: 1.000000, .75R: 1.000000, count: 1, class_loss = 120.002457, iou_loss = 180.872330, total_loss = 300.874786 v3 (iou loss, Normalizer: (iou: 0.05, cls: 0.40) Region 203 Avg (IOU: 0.000000, GIOU: 0.000000), Class: 0.000000, Obj: 0.000000, No Obj: 0.500000, .5R: 0.000000, .75R: 0.000000, count: 1, class_loss = 30.000389, iou_loss = 45.000580, total_loss = 75.000969 total_bbox = 79808, rewritten_bbox = 0.000000 % v3 (iou loss, Normalizer: (iou: 0.05, cls: 0.40) Region 195 Avg (IOU: 0.000000, GIOU: 0.000000), Class: 0.000000, Obj: 0.000000, No Obj: 0.500000, .5R: 0.000000, .75R: 0.000000, count: 1, class_loss = 479.989929, iou_loss = 719.984924, total_loss = 1199.974854 v3 (iou loss, Normalizer: (iou: 0.05, cls: 0.40) Region 199 Avg (IOU: 0.000000, GIOU: 0.000000), Class: 0.000000, Obj: 0.000000, No Obj: 0.500000, .5R: 0.000000, .75R: 0.000000, count: 1, class_loss = 119.999092, iou_loss = 179.998627, total_loss = 299.997711 v3 (iou loss, Normalizer: (iou: 0.05, cls: 0.40) Region 203 Avg (IOU: 0.000000, GIOU: 0.000000), Class: 0.000000, Obj: 0.000000, No Obj: 0.500000, .5R: 0.000000, .75R: 0.000000, count: 1, class_loss = 30.000389, iou_loss = 45.000580, total_loss = 75.000969 total_bbox = 79808, rewritten_bbox = 0.000000 % Loaded: 0.000022 seconds

(next mAP calculation at 4424 iterations) Last accuracy mAP@0.5 = 0.11 %, best = 0.11 % 4424: 209.995316, 209.995621 avg loss, 0.001000 rate, 5.618830 seconds, 70784 images, 7.771258 hours left ^M4^M8^M12^M16^M20^M24^M28^M32^M36^M40^M44^M48^M52^M56^M60^M64^M68^M72^M76^M80^M84^M88^M92^M96^M100^M104^M108^M112^M116^M120^M124^M128^M132^M136^M140^M144^M148^M152^M156^M160^M164^M168^M172^M176^M180^M184^M188^M192^M196^M200^M204^M208^M212^M216^M220^M224^M228^M232^M236^M240^M244^M248^M252^M256^M260^M264^M268^M272^M276^M280^M284^M288^M292^M296^M300^M304^M308^M312^M316^M320^M324^M328^M332^M336^M340^M344^M348^M352^M356^M360^M364^M368^M372^M376^M380^M384^M388^M392^M396^M400^M404^M408^M412^M416^M420^M424^M428^M432^M436^M440^M444^M448^M452^M456^M460^M464^M468^M472^M476^M480^M484^M488^M492^M496^M500^M504^M508^M512^M516^M520^M524^M528^M532^M536^M540^M544^M548^M552^M556^M560^M564^M568^M572^M576^M580^M584^M588^M592^M596^M600^M604^M608^M612^M616^M620^M624^M628^M632^M636^M640^M644^M648^M652^M656^M660^M664^M668^M672^M676^M680^M684^M688^M692^M696^M700^M704^M708^M712^M716^M720^M724^M728^M732^M736^M740^M744

AlexeyAB commented 2 years ago

The training is stopped after 4000 iterations, and test is stopped as well. ...

Thanks for the reply! I just modify the '-iou_thresh' error.

So is the issue with stopped Training/Testing solved?


But the inference is still slow... My image size is 800x800.

instead of for 1000 iterations: ./darknet detector map data/yolov4-csp-x-swish.data cfg/yolov4-csp-x-swish.cfg yolov4_csp_x_swish_backup/yolov4-csp-x-swish_1000.weights -iou_thresh 0.25

WilburZjh commented 2 years ago

The training is stopped after 4000 iterations, and test is stopped as well. ...

Thanks for the reply! I just modify the '-iou_thresh' error.

So is the issue with stopped Training/Testing solved?

The issue is still not solved... The inference time is still very slow.

But the inference is still slow... My image size is 800x800.

* What GPU do you use?

I use 2 Tesla P100 12G GPUs...

* Try to test weights for 4000 iterations:
  `./darknet detector map data/yolov4-csp-x-swish.data cfg/yolov4-csp-x-swish.cfg yolov4_csp_x_swish_backup/yolov4-csp-x-swish_4000.weights -iou_thresh 0.25`
  is it faster?

It is not faster...

instead of for 1000 iterations: ./darknet detector map data/yolov4-csp-x-swish.data cfg/yolov4-csp-x-swish.cfg yolov4_csp_x_swish_backup/yolov4-csp-x-swish_1000.weights -iou_thresh 0.25

Also, the results are all 0 no matter which weights that I use...The false positive is extremely high... Dont know why... Output is pasted as follow. [yolo] params: iou loss: ciou (4), iou_norm: 0.05, obj_norm: 0.40, cls_norm: 0.50, delta_norm: 1.00, scale_x_y: 2.00 Unused field: 'new_coords = 1' 196 route 178 -> 50 x 50 x 320 197 conv 640 3 x 3/ 1 50 x 50 x 320 -> 50 x 50 x 640 9.216 BF 198 conv 18 1 x 1/ 1 50 x 50 x 640 -> 50 x 50 x 18 0.058 BF 199 yolo [yolo] params: iou loss: ciou (4), iou_norm: 0.05, obj_norm: 0.40, cls_norm: 0.50, delta_norm: 1.00, scale_x_y: 2.00 Unused field: 'new_coords = 1' 200 route 191 -> 25 x 25 x 640 201 conv 1280 3 x 3/ 1 25 x 25 x 640 -> 25 x 25 x1280 9.216 BF 202 conv 18 1 x 1/ 1 25 x 25 x1280 -> 25 x 25 x 18 0.029 BF 203 yolo [yolo] params: iou loss: ciou (4), iou_norm: 0.05, obj_norm: 0.40, cls_norm: 0.50, delta_norm: 1.00, scale_x_y: 2.00 Unused field: 'new_coords = 1' Total BFLOPS 344.198 avg_outputs = 1556629 Allocate additional workspace_size = 230.40 MB Loading weights from mitosis_yolov4_csp_x_swish_backup/yolov4-csp-x-swish_4000.weights...net.optimized_memory = 0 mini_batch = 1, batch = 1, time_steps = 1, train = 0 nms_kind: diounms (2), beta = 0.600000 nms_kind: diounms (2), beta = 0.600000 nms_kind: diounms (2), beta = 0.600000 Done! Loaded 204 layers from weights-file ^M4^M8^M12^M16^M20^M24^M28^M32^M36^M40^M44^M48^M52^M56^M60^M64^M68^M72^M76^M80^M84^M88^M92^M96^M100^M104^M108^M112^M116^M120^M124^M128^M132^M136^M140^M144^M148^M152^M156^M160^M164^M168^M172^M176^M180^M184^M188^M192^M196^M200^M204^M208^M212^M216^M220^M224^M228^M232^M236^M240^M244^M248^M252^M256^M260^M264^M268^M272^M276^M280^M284^M288^M292^M296^M300^M304^M308^M312^M316^M320^M324^M328^M332^M336^M340^M344^M348^M352^M356^M360^M364^M368^M372^M376^M380^M384^M388^M392^M396^M400^M404^M408^M412^M416^M420^M424^M428^M432^M436^M440^M444^M448^M452^M456^M460^M464^M468^M472^M476^M480^M484^M488^M492^M496^M500^M504^M508^M512^M516^M520^M524^M528^M532^M536^M540^M544^M548^M552^M556^M560^M564^M568^M572^M576^M580^M584^M588^M592^M596^M600^M604^M608^M612^M616^M620^M624^M628^M632^M636^M640^M644^M648^M652^M656^M660^M664^M668^M672^M676^M680^M684^M688^M692^M696^M700^M704^M708^M712^M716^M720^M724^M728^M732^M736^M740^M744^M748^M752^M756^M760^M764^M768^M772^M776^M780^M784^M788^M792^M796^M800^M804^M808^M812^M816^M820^M824^M828^M832^M836^M840^M844^M848^M852^M856^M860^M864^M868^M872^M876^M880^M884^M888^M892^M896^M900^M904^M908^M912^M916^M920^M924^M928^M932^M936^M940^M944^M948^M952^M956Total Detection Time: 24574 Seconds

seen 64, trained: 32 K-images (0 Kilo-batches_64)

calculation mAP (mean average precision)... Detection layer: 195 - type = 28 Detection layer: 199 - type = 28 Detection layer: 203 - type = 28

detections_count = 4714264, unique_truth_count = 148

for conf_thresh = 0.25, precision = 0.00, recall = 1.00, F1-score = 0.00 for conf_thresh = 0.25, TP = 148, FP = 4714116, FN = 0, average IoU = 0.00 %

IoU threshold = 25 %, used Area-Under-Curve for each unique Recall mean average precision (mAP@0.25) = 0.001415, or 0.14 %

AlexeyAB commented 2 years ago

Your model is poorly trained. Train again with these values in cfg-file and train at least 6000 iterations:

[net]
batch=64
subdivisions=32
max_batches=6000

Read: https://github.com/AlexeyAB/darknet#how-to-train-to-detect-your-custom-objects

change line batch to batch=64 change line subdivisions to subdivisions=16 change line max_batches to (classes*2000, but not less than number of training images and not less than 6000), f.e. max_batches=6000 if you train for 3 classes ... Note: if error Out of memory occurs then in .cfg-file you should increase subdivisions=16, 32 or 64: link

WilburZjh commented 2 years ago

Your model is poorly trained. Train again with these values in cfg-file and train at least 6000 iterations:

[net]
batch=64
subdivisions=32
max_batches=6000

Read: https://github.com/AlexeyAB/darknet#how-to-train-to-detect-your-custom-objects

change line batch to batch=64 change line subdivisions to subdivisions=16 change line max_batches to (classes*2000, but not less than number of training images and not less than 6000), f.e. max_batches=6000 if you train for 3 classes ... Note: if error Out of memory occurs then in .cfg-file you should increase subdivisions=16, 32 or 64: link

Thanks for the patient, I will train again and let you know.

WilburZjh commented 2 years ago

Hi @AlexeyAB ,

After training again with both YOLOv4-csp-x-swish.cfg and YOLOv4-p5.cfg, no matter which weight I use to generate the inference on the test set. I always get 0 and nan. One of the test result is shown as follows.

calculation mAP (mean average precision)... Detection layer: 235 - type = 28 Detection layer: 239 - type = 28 Detection layer: 243 - type = 28

detections_count = 19, unique_truth_count = 148 rank = 0 of ranks = 19 ^Mclass_id = 0, name = mitosis, ap = 0.00% (TP = 0, FP = 0)

for conf_thresh = 0.25, precision = -nan, recall = 0.00, F1-score = -nan for conf_thresh = 0.25, TP = 0, FP = 0, FN = 148, average IoU = 0.00 %

IoU threshold = 25 %, used Area-Under-Curve for each unique Recall mean average precision (mAP@0.25) = 0.000000, or 0.00 %

The training result is shown as follows. v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 243 Avg (IOU: 0.000000), count: 1, class_loss = 0.002524, iou_loss = 0.000000, total_loss = 0.002524 total_bbox = 375432, rewritten_bbox = 0.000000 % v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 235 Avg (IOU: 0.000000), count: 1, class_loss = 0.000185, iou_loss = 0.000000, total_loss = 0.000185 v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 239 Avg (IOU: 0.000000), count: 1, class_loss = 0.005467, iou_loss = 0.000000, total_loss = 0.005467 v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 243 Avg (IOU: 0.000000), count: 1, class_loss = 0.002524, iou_loss = 0.000000, total_loss = 0.002524 total_bbox = 375432, rewritten_bbox = 0.000000 % v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 235 Avg (IOU: 0.000000), count: 1, class_loss = 0.000178, iou_loss = 0.000000, total_loss = 0.000178 v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 239 Avg (IOU: 0.000000), count: 1, class_loss = 0.005423, iou_loss = 0.000000, total_loss = 0.005423 v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 243 Avg (IOU: 0.000000), count: 1, class_loss = 0.002524, iou_loss = 0.000000, total_loss = 0.002524 total_bbox = 375432, rewritten_bbox = 0.000000 % Loaded: 0.000016 seconds

(next mAP calculation at 13272 iterations) Last accuracy mAP@0.50 = 0.00 %, best = 0.00 % 9000: 0.388740, 1.080727 avg loss, 0.000010 rate, 3.575148 seconds, 72000 images, 0.112909 hours left ^M4^M8^M12^M16^M20^M24^M28^M32^M36^M40^M44^M48^M52^M56^M60^M64^M68^M72^M76^M80^M84^M88^M92^M96^M100^M104^M108^M112^M116^M120^M124^M128^M132^M136^M140^M144^M148^M152^M156^M160^M164^M168^M172^M176^M180^M184^M188^M192^M196^M200^M204^M208^M212^M216^M220^M224^M228^M232^M236^M240^M244^M248^M252^M256^M260^M264^M268^M272^M276^M280^M284^M288^M292^M296^M300^M304^M308^M312^M316^M320^M324^M328^M332^M336^M340^M344^M348^M352^M356^M360^M364^M368^M372^M376^M380^M384^M388^M392^M396^M400^M404^M408^M412^M416^M420^M424^M428^M432^M436^M440^M444^M448^M452^M456^M460^M464^M468^M472^M476^M480^M484^M488^M492^M496^M500^M504^M508^M512^M516^M520^M524^M528^M532^M536^M540^M544^M548^M552^M556^M560^M564^M568^M572^M576^M580^M584^M588^M592^M596^M600^M604^M608^M612^M616^M620^M624^M628^M632^M636^M640^M644^M648^M652^M656^M660^M664^M668^M672^M676^M680^M684^M688^M692^M696^M700^M704^M708^M712^M716^M720^M724^M728^M732^M736^M740^M744^M748^M752^M756^M760^M764^M768^M772^M776^M780^M784^M788^M792^M796^M800^M804^M808^M812^M816^M820^M824^M828^M832^M836^M840^M844^M848^M852^M856^M860^M864^M868^M872^M876^M880^M884^M888^M892^M896^M900^M904^M908^M912^M916^M920^M924^M928^M932^M936^M940^M944^M948^M952^M956Total Detection Time: 110 Seconds Saving weights to yolov4_p5_backup//yolov4-p5_9000.weights Saving weights to yolov4_p5_backup//yolov4-p5_last.weights Saving weights to yolov4_p5_backup//yolov4-p5_ema.weights Saving weights to yolov4_p5_backup//yolov4-p5_final.weights

calculation mAP (mean average precision)... Detection layer: 235 - type = 28 Detection layer: 239 - type = 28 Detection layer: 243 - type = 28

detections_count = 2123, unique_truth_count = 148 rank = 0 of ranks = 2123 ^M rank = 100 of ranks = 2123 ^M rank = 200 of ranks = 2123 ^M rank = 300 of ranks = 2123 ^M rank = 400 of ranks = 2123 ^M rank = 500 of ranks = 2123 ^M rank = 600 of ranks = 2123 ^M rank = 700 of ranks = 2123 ^M rank = 800 of ranks = 2123 ^M rank = 900 of ranks = 2123 ^M rank = 1000 of ranks = 2123 ^M rank = 1100 of ranks = 2123 ^M rank = 1200 of ranks = 2123 ^M rank = 1300 of ranks = 2123 ^M rank = 1400 of ranks = 2123 ^M rank = 1500 of ranks = 2123 ^M rank = 1600 of ranks = 2123 ^M rank = 1700 of ranks = 2123 ^M rank = 1800 of ranks = 2123 ^M rank = 1900 of ranks = 2123 ^M rank = 2000 of ranks = 2123 ^M rank = 2100 of ranks = 2123 ^Mclass_id = 0, name = mitosis, ap = 0.00% (TP = 0, FP = 0)

for conf_thresh = 0.25, precision = -nan, recall = 0.00, F1-score = -nan for conf_thresh = 0.25, TP = 0, FP = 0, FN = 148, average IoU = 0.00 %

IoU threshold = 50 %, used Area-Under-Curve for each unique Recall mean average precision (mAP@0.50) = 0.000004, or 0.00 %

Set -points flag: -points 101 for MS COCO -points 11 for PascalVOC 2007 (uncomment difficult in voc.data) -points 0 (AUC) for ImageNet, PascalVOC 2010-2012, your custom dataset

mean_average_precision (mAP@0.50) = 0.000004 EMA weights are saved to the file: yolov4_p5_backup//yolov4-p5_ema.weights If you want to train from the beginning, then use flag in the end of training command: -clear

The cfg file for p5 and csp-x-swish is the same as shown before... Any ideas why the result is all 0 in both training and testing?

Btw, the classification result is more important than regression, shall I change the parameter in each YOLO layer in the cfg file?

Best

AlexeyAB commented 2 years ago
WilburZjh commented 2 years ago
* Show screenshots of your commands: training and testing

Training:

./darknet detector train data/yolov4-p5.data cfg/yolov4-p5.cfg yolov4-p5.weights -clear -map -dont_show

Testing:

./darknet detector map data/yolov4-p5.data cfg/yolov4-p5.cfg yolov4_p5_backup/yolov4-p5_9000.weights -iou_thresh 0.25 ./darknet detector map data/yolov4-p5.data cfg/yolov4-p5.cfg yolov4_p5_backup/yolov4-p5_best.weights -iou_thresh 0.25

* Attach your cfg-file

[net]
# Testing
#batch=1
#subdivisions=1
# Training
batch=8
subdivisions=8
width=896
height=896
channels=3
momentum=0.949
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1

learning_rate=0.001 burn_in=1000 max_batches = 9000 policy=steps steps=7200,8100 scales=.1,.1

mosaic=1

letter_box=1

ema_alpha=0.9998

use_cuda_graph = 1

============ Backbone ============

Stem

0

[convolutional] batch_normalize=1 filters=32 size=3 stride=1 pad=1 activation=mish

P1

Downsample

[convolutional] batch_normalize=1 filters=64 size=3 stride=2 pad=1 activation=mish

Split

[convolutional] batch_normalize=1 filters=32 size=1 stride=1 pad=1 activation=mish

[route] layers = -2

[convolutional] batch_normalize=1 filters=32 size=1 stride=1 pad=1 activation=mish

Residual Block

[convolutional] batch_normalize=1 filters=32 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=32 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

Transition first

[convolutional] batch_normalize=1 filters=32 size=1 stride=1 pad=1 activation=mish

Merge [-1, -(3k+4)]

[route] layers = -1,-7

Transition last

10 (previous+7+3k)

[convolutional] batch_normalize=1 filters=64 size=1 stride=1 pad=1 activation=mish

P2

Downsample

[convolutional] batch_normalize=1 filters=128 size=3 stride=2 pad=1 activation=mish

Split

[convolutional] batch_normalize=1 filters=64 size=1 stride=1 pad=1 activation=mish

[route] layers = -2

[convolutional] batch_normalize=1 filters=64 size=1 stride=1 pad=1 activation=mish

Residual Block

[convolutional] batch_normalize=1 filters=64 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=64 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=64 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=64 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=64 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=64 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

Transition first

[convolutional] batch_normalize=1 filters=64 size=1 stride=1 pad=1 activation=mish

Merge [-1, -(3k+4)]

[route] layers = -1,-13

Transition last

26 (previous+7+3k)

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=mish

P3

Downsample

[convolutional] batch_normalize=1 filters=256 size=3 stride=2 pad=1 activation=mish

Split

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=mish

[route] layers = -2

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=mish

Residual Block

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=128 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=128 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=128 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=128 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=128 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=128 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=128 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=128 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=128 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=128 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=128 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=128 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=128 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=128 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=128 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

Transition first

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=mish

Merge [-1, -(3k+4)]

[route] layers = -1,-49

Transition last

78 (previous+7+3k)

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=mish

P4

Downsample

[convolutional] batch_normalize=1 filters=512 size=3 stride=2 pad=1 activation=mish

Split

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=mish

[route] layers = -2

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=mish

Residual Block

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

Transition first

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=mish

Merge [-1, -(3k+4)]

[route] layers = -1,-49

Transition last

130 (previous+7+3k)

[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=mish

P5

Downsample

[convolutional] batch_normalize=1 filters=1024 size=3 stride=2 pad=1 activation=mish

Split

[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=mish

[route] layers = -2

[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=mish

Residual Block

[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

Transition first

[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=mish

Merge [-1, -(3k+4)]

[route] layers = -1,-25

Transition last

158 (previous+7+3k)

[convolutional] batch_normalize=1 filters=1024 size=1 stride=1 pad=1 activation=mish

============ End of Backbone ============

============ Neck ============

CSPSPP

[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=mish

[route] layers = -2

[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=512 activation=mish

[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=mish

SPP

[maxpool] stride=1 size=5

[route] layers=-2

[maxpool] stride=1 size=9

[route] layers=-4

[maxpool] stride=1 size=13

[route] layers=-1,-3,-5,-6

End SPP

[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=512 activation=mish

[route] layers = -1, -13

173 (previous+6+5+2k)

[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=mish

End of CSPSPP

FPN-4

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=mish

[upsample] stride=2

[route] layers = 130

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=mish

[route] layers = -1, -3

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=mish

Split

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=mish

[route] layers = -2

Plain Block

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=256 activation=mish

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=256 activation=mish

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=256 activation=mish

Merge [-1, -(2k+2)]

[route] layers = -1, -8

Transition last

189 (previous+6+4+2k)

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=mish

FPN-3

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=mish

[upsample] stride=2

[route] layers = 78

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=mish

[route] layers = -1, -3

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=mish

Split

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=mish

[route] layers = -2

Plain Block

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=128 activation=mish

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=128 activation=mish

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=128 activation=mish

Merge [-1, -(2k+2)]

[route] layers = -1, -8

Transition last

205 (previous+6+4+2k)

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=mish

PAN-4

[convolutional] batch_normalize=1 size=3 stride=2 pad=1 filters=256 activation=mish

[route] layers = -1, 189

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=mish

Split

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=mish

[route] layers = -2

Plain Block

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=256 activation=mish

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=256 activation=mish

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=256 activation=mish

[route] layers = -1,-8

Transition last

218 (previous+3+4+2k)

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=mish

PAN-5

[convolutional] batch_normalize=1 size=3 stride=2 pad=1 filters=512 activation=mish

[route] layers = -1, 173

[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=mish

Split

[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=mish

[route] layers = -2

Plain Block

[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=512 activation=mish

[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=512 activation=mish

[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=512 activation=mish

[route] layers = -1,-8

Transition last

231 (previous+3+4+2k)

[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=mish

============ End of Neck ============

============ Head ============

YOLO-3

[route] layers = 205

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=256 activation=mish

[convolutional] size=1 stride=1 pad=1 filters=24 activation=logistic

activation=linear

use linear for Pytorch-Scaled-YOLOv4, and logistic for Darknet

[yolo] mask = 0,1,2,3 anchors = 13,17, 31,25, 24,51, 61,45, 48,102, 119,96, 97,189, 217,184, 171,384, 324,451, 616,618, 800,800 classes=1 num=12 jitter=.1 scale_x_y = 2.0 objectness_smooth=1 ignore_thresh = .7 truth_thresh = 1

random=1

resize=1.5 iou_thresh=0.2 iou_normalizer=0.05 cls_normalizer=0.5 obj_normalizer=1.0 iou_loss=ciou nms_kind=diounms beta_nms=0.6 new_coords=1 max_delta=2

YOLO-4

[route] layers = 218

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=512 activation=mish

[convolutional] size=1 stride=1 pad=1 filters=24 activation=logistic

activation=linear

use linear for Pytorch-Scaled-YOLOv4, and logistic for Darknet

[yolo] mask = 4,5,6,7 anchors = 13,17, 31,25, 24,51, 61,45, 48,102, 119,96, 97,189, 217,184, 171,384, 324,451, 616,618, 800,800 classes=1 num=12 jitter=.1 scale_x_y = 2.0 objectness_smooth=1 ignore_thresh = .7 truth_thresh = 1

random=1

resize=1.5 iou_thresh=0.2 iou_normalizer=0.05 cls_normalizer=0.5 obj_normalizer=1.0 iou_loss=ciou nms_kind=diounms beta_nms=0.6 new_coords=1 max_delta=2

YOLO-5

[route] layers = 231

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=1024 activation=mish

[convolutional] size=1 stride=1 pad=1 filters=24 activation=logistic

activation=linear

use linear for Pytorch-Scaled-YOLOv4, and logistic for Darknet

[yolo] mask = 8,9,10,11 anchors = 13,17, 31,25, 24,51, 61,45, 48,102, 119,96, 97,189, 217,184, 171,384, 324,451, 616,618, 800,800 classes=1 num=12 jitter=.1 scale_x_y = 2.0 objectness_smooth=1 ignore_thresh = .7 truth_thresh = 1

random=1

resize=1.5 iou_thresh=0.2 iou_normalizer=0.05 cls_normalizer=0.5 obj_normalizer=1.0 iou_loss=ciou nms_kind=diounms beta_nms=0.6 new_coords=1 max_delta=2


# ============ End of Head ============ #

>     * Show chart.pn
![chart_yolov4-p5](https://user-images.githubusercontent.com/11021713/134420349-14c2b6d8-ab40-4e50-8330-7a6e60837372.png)
g
AlexeyAB commented 2 years ago

Can you share your weights-file yolov4-p5_9000.weights via Google-disk or somehow else?

WilburZjh commented 2 years ago

Can you share your weights-file yolov4-p5_9000.weights via Google-disk or somehow else?

Yes, I have share this weight file to your gmail via Google disk.

AlexeyAB commented 2 years ago

Thanks, can you also share several images for testing with required objects?

WilburZjh commented 2 years ago

Thanks, can you also share several images for testing with required objects?

shared.

AlexeyAB commented 2 years ago

@WilburZjh

  1. Your trained model can detect some objects correctly, if I use cfg from your message yolov4-p5-custom.cfg.txt and weights-file and pictures provided by you. I shared results via Google-disk. Command to reproduce results: ./darknet detector test data/conveyor.data yolov4-p5-custom.cfg yolov4-p5_9000.weights -ext_output img/A04_01Ca_512_1024.jpg -thresh 0.002

  2. But I need to use very low confidence threshold -thresh 0.002 to get any detections, because your model is poorly trained, because you use batch=8 and trained for max_batches=9000 iterations, i.e. you trained model only for (9000x8) = 72000 images, while it should be trained with batch=64 for max_batches=6000 iterations for (6000x64) = 384000 images, i.e. 5 times longer.

  3. If you decrease batch size by 8x times (set batch=8 instead of batch=64), then you also need to increase max_batches 8x higher to train longer max_batches=48000, and decrease learning_rate 8x times smaller learning_rate=0.000125 That is why it is better to keep always batch=64, and if you get Out of memory error, then increase just subdivisions=64, it will solve Out of memory issue, without changing max_batches and learning_rate.

  4. If all your train/val/test and all subsequent images in the future will be 512x512, it is better to set width=512 height=512 in your cfg-file


Training:

./darknet detector train data/yolov4-p5.data cfg/yolov4-p5.cfg yolov4-p5.weights -clear -map -dont_show

Testing:

./darknet detector map data/yolov4-p5.data cfg/yolov4-p5.cfg yolov4_p5_backup/yolov4-p5_9000.weights -iou_thresh 0.25 ./darknet detector map data/yolov4-p5.data cfg/yolov4-p5.cfg yolov4_p5_backup/yolov4-p5_best.weights -iou_thresh 0.25

I don't know why do you get Nan during mAP calculation, since I can get some correct detections with low confidence threshold, just check you pathes, that cfg/yolov4-p5.cfg is path to your custom cfg-file rather than to default YOLOv4-p5 cfg file.


I suggest you to train your model using this cfg-file yolov4-p5-custom2.cfg.txt with params:

[net]
batch=64
subdivisions=32
width=512
height=512

learning_rate=0.001
burn_in=1000
max_batches = 9000
policy=steps
steps=7200,8100
scales=.1,.1
WilburZjh commented 2 years ago

@WilburZjh

1. Your trained model can detect some objects correctly, if I use cfg from your message [yolov4-p5-custom.cfg.txt](https://github.com/AlexeyAB/darknet/files/7214071/yolov4-p5-custom.cfg.txt) and weights-file and pictures provided by you. I shared results via Google-disk.
   Command to reproduce results:
   `./darknet detector test data/conveyor.data yolov4-p5-custom.cfg yolov4-p5_9000.weights -ext_output img/A04_01Ca_512_1024.jpg -thresh 0.002`

2. But I need to use very low confidence threshold `-thresh 0.002` to get any detections, because your model is poorly trained, because you use `batch=8` and trained for `max_batches=9000` iterations, i.e. you trained model only for  (9000x8) = 72000 images, while it should be trained with `batch=64` for `max_batches=6000` iterations for (6000x64) = 384000 images, i.e. 5 times longer.

3. If you decrease batch size by 8x times (set `batch=8` instead of batch=64), then you also need to increase max_batches 8x higher to train longer `max_batches=48000`, and decrease learning_rate 8x times smaller `learning_rate=0.000125`
   That is why it is better to keep always `batch=64`, and if you get Out of memory error, then increase just `subdivisions=64`, it will solve Out of memory issue, without changing max_batches and learning_rate.

4. If all your train/val/test and all subsequent images in the future will be 512x512, it is better to set `width=512 height=512` in your cfg-file

Training: ./darknet detector train data/yolov4-p5.data cfg/yolov4-p5.cfg yolov4-p5.weights -clear -map -dont_show Testing: ./darknet detector map data/yolov4-p5.data cfg/yolov4-p5.cfg yolov4_p5_backup/yolov4-p5_9000.weights -iou_thresh 0.25 ./darknet detector map data/yolov4-p5.data cfg/yolov4-p5.cfg yolov4_p5_backup/yolov4-p5_best.weights -iou_thresh 0.25

I don't know why do you get Nan during mAP calculation, since I can get some correct detections with low confidence threshold, just check you pathes, that cfg/yolov4-p5.cfg is path to your custom cfg-file rather than to default YOLOv4-p5 cfg file.

I suggest you to train your model using this cfg-file yolov4-p5-custom2.cfg.txt with params:

[net]
batch=64
subdivisions=32
width=512
height=512

learning_rate=0.001
burn_in=1000
max_batches = 9000
policy=steps
steps=7200,8100
scales=.1,.1

Thanks for the information! I will train again and let you know.

AlexeyAB commented 2 years ago

@WilburZjh

Or even better try to train this cfg-file yolov4-p5-custom3.cfg.txt I also reduced the size of the anchors to fit the 512x512 network resolution.

WilburZjh commented 2 years ago

@WilburZjh

Or even better try to train this cfg-file yolov4-p5-custom3.cfg.txt I also reduced the size of the anchors to fit the 512x512 network resolution.

Thanks Alexeyab, I will try both of them! Regards.

WilburZjh commented 2 years ago

@WilburZjh

Or even better try to train this cfg-file yolov4-p5-custom3.cfg.txt I also reduced the size of the anchors to fit the 512x512 network resolution.

Hi @AlexeyAB , May I know which weights file should I use to train my own dataset? yolov4-p5.conv.232 or yolov4-p5.weights? and shall I train with -clear ?

AlexeyAB commented 2 years ago

It is better to train with yolov4-p5.conv.232 in this case you can skip the flag -clear

WilburZjh commented 2 years ago

It is better to train with yolov4-p5.conv.232 in this case you can skip the flag -clear

Thanks!

WilburZjh commented 2 years ago

It is better to train with yolov4-p5.conv.232 in this case you can skip the flag -clear

Hi @AlexeyAB , since I want to calculate the F1-score, can you tell me where I need to change the conf_thresh in the following statement when I run the darknet detector map command?

for conf_thresh = 0.25, precision = 0.54, recall = 0.53, F1-score = 0.53

WongKinYiu commented 2 years ago

https://github.com/AlexeyAB/darknet/blob/master/src/detector.c#L995

WilburZjh commented 2 years ago

https://github.com/AlexeyAB/darknet/blob/master/src/detector.c#L995

Thanks for the reply! It is 0.005 in the detector.c file but why the test result of conf_thresh is 0.25?

WongKinYiu commented 2 years ago

oh, i misunderstand your problem. https://github.com/AlexeyAB/darknet/blob/master/src/detector.c#L995 this line is for threshold used to calculate map.

for f1-score, just append -thresh 0.25 after your command.

WilburZjh commented 2 years ago

oh, i misunderstand your problem. https://github.com/AlexeyAB/darknet/blob/master/src/detector.c#L995 this line is for threshold used to calculate map.

for f1-score, just append -thresh 0.25 after your command.

Thank!

WilburZjh commented 2 years ago

@WilburZjh

1. Your trained model can detect some objects correctly, if I use cfg from your message [yolov4-p5-custom.cfg.txt](https://github.com/AlexeyAB/darknet/files/7214071/yolov4-p5-custom.cfg.txt) and weights-file and pictures provided by you. I shared results via Google-disk.
   Command to reproduce results:
   `./darknet detector test data/conveyor.data yolov4-p5-custom.cfg yolov4-p5_9000.weights -ext_output img/A04_01Ca_512_1024.jpg -thresh 0.002`

2. But I need to use very low confidence threshold `-thresh 0.002` to get any detections, because your model is poorly trained, because you use `batch=8` and trained for `max_batches=9000` iterations, i.e. you trained model only for  (9000x8) = 72000 images, while it should be trained with `batch=64` for `max_batches=6000` iterations for (6000x64) = 384000 images, i.e. 5 times longer.

3. If you decrease batch size by 8x times (set `batch=8` instead of batch=64), then you also need to increase max_batches 8x higher to train longer `max_batches=48000`, and decrease learning_rate 8x times smaller `learning_rate=0.000125`
   That is why it is better to keep always `batch=64`, and if you get Out of memory error, then increase just `subdivisions=64`, it will solve Out of memory issue, without changing max_batches and learning_rate.

4. If all your train/val/test and all subsequent images in the future will be 512x512, it is better to set `width=512 height=512` in your cfg-file

Training: ./darknet detector train data/yolov4-p5.data cfg/yolov4-p5.cfg yolov4-p5.weights -clear -map -dont_show Testing: ./darknet detector map data/yolov4-p5.data cfg/yolov4-p5.cfg yolov4_p5_backup/yolov4-p5_9000.weights -iou_thresh 0.25 ./darknet detector map data/yolov4-p5.data cfg/yolov4-p5.cfg yolov4_p5_backup/yolov4-p5_best.weights -iou_thresh 0.25

I don't know why do you get Nan during mAP calculation, since I can get some correct detections with low confidence threshold, just check you pathes, that cfg/yolov4-p5.cfg is path to your custom cfg-file rather than to default YOLOv4-p5 cfg file.

I suggest you to train your model using this cfg-file yolov4-p5-custom2.cfg.txt with params:

[net]
batch=64
subdivisions=32
width=512
height=512

learning_rate=0.001
burn_in=1000
max_batches = 9000
policy=steps
steps=7200,8100
scales=.1,.1

Hi @AlexeyAB , I have tried a lot with the following configuration file, but I still get very low map(Last accuracy mAP@0.50 = 13.95 %, best = 13.95 %) in the training even I have trained for 7900 times...

I use the following command for training:

./darknet detector train data/yolov4-csp-x-swish.data yolov4-csp-x-swish.cfg yolov4-csp-x-swish.conv.192 -map -dont_show

I use the following configuration files: (As mentioned before, I increase the batch size and keep the max_batches, I also use the calculate.sh to get the anchors size for 800x800.) [net]

Testing

batch=1

subdivisions=1

Training

batch=64 subdivisions=64 width=800 height=800 channels=3 momentum=0.949 decay=0.0005 angle=0 saturation = 1.5 exposure = 1.5 hue=.1

learning_rate=0.001 burn_in=1000 max_batches = 9000 policy=steps steps=7200,8100 scales=.1,.1

mosaic=1

letter_box=1

ema_alpha=0.9998

optimized_memory=1

============ Backbone ============

Stem

0

[convolutional] batch_normalize=1 filters=32 size=3 stride=1 pad=1 activation=swish

P1

Downsample

[convolutional] batch_normalize=1 filters=80 size=3 stride=2 pad=1 activation=swish

Residual Block

[convolutional] batch_normalize=1 filters=40 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 filters=80 size=3 stride=1 pad=1 activation=swish

4 (previous+1+3k)

[shortcut] from=-3 activation=linear

P2

Downsample

[convolutional] batch_normalize=1 filters=160 size=3 stride=2 pad=1 activation=swish

Split

[convolutional] batch_normalize=1 filters=80 size=1 stride=1 pad=1 activation=swish

[route] layers = -2

[convolutional] batch_normalize=1 filters=80 size=1 stride=1 pad=1 activation=swish

Residual Block

[convolutional] batch_normalize=1 filters=80 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 filters=80 size=3 stride=1 pad=1 activation=swish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=80 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 filters=80 size=3 stride=1 pad=1 activation=swish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=80 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 filters=80 size=3 stride=1 pad=1 activation=swish

[shortcut] from=-3 activation=linear

Transition first

[convolutional] batch_normalize=1 filters=80 size=1 stride=1 pad=1 activation=swish

Merge [-1, -(3k+4)]

[route] layers = -1,-13

Transition last

20 (previous+7+3k)

[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=swish

P3

Downsample

[convolutional] batch_normalize=1 filters=320 size=3 stride=2 pad=1 activation=swish

Split

[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=swish

[route] layers = -2

[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=swish

Residual Block

[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 filters=160 size=3 stride=1 pad=1 activation=swish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 filters=160 size=3 stride=1 pad=1 activation=swish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 filters=160 size=3 stride=1 pad=1 activation=swish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 filters=160 size=3 stride=1 pad=1 activation=swish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 filters=160 size=3 stride=1 pad=1 activation=swish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 filters=160 size=3 stride=1 pad=1 activation=swish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 filters=160 size=3 stride=1 pad=1 activation=swish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 filters=160 size=3 stride=1 pad=1 activation=swish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 filters=160 size=3 stride=1 pad=1 activation=swish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 filters=160 size=3 stride=1 pad=1 activation=swish

[shortcut] from=-3 activation=linear

Transition first

[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=swish

Merge [-1 -(4+3k)]

[route] layers = -1,-34

Transition last

57 (previous+7+3k)

[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=swish

P4

Downsample

[convolutional] batch_normalize=1 filters=640 size=3 stride=2 pad=1 activation=swish

Split

[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=swish

[route] layers = -2

[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=swish

Residual Block

[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 filters=320 size=3 stride=1 pad=1 activation=swish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 filters=320 size=3 stride=1 pad=1 activation=swish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 filters=320 size=3 stride=1 pad=1 activation=swish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 filters=320 size=3 stride=1 pad=1 activation=swish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 filters=320 size=3 stride=1 pad=1 activation=swish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 filters=320 size=3 stride=1 pad=1 activation=swish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 filters=320 size=3 stride=1 pad=1 activation=swish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 filters=320 size=3 stride=1 pad=1 activation=swish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 filters=320 size=3 stride=1 pad=1 activation=swish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 filters=320 size=3 stride=1 pad=1 activation=swish

[shortcut] from=-3 activation=linear

Transition first

[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=swish

Merge [-1 -(3k+4)]

[route] layers = -1,-34

Transition last

94 (previous+7+3k)

[convolutional] batch_normalize=1 filters=640 size=1 stride=1 pad=1 activation=swish

P5

Downsample

[convolutional] batch_normalize=1 filters=1280 size=3 stride=2 pad=1 activation=swish

Split

[convolutional] batch_normalize=1 filters=640 size=1 stride=1 pad=1 activation=swish

[route] layers = -2

[convolutional] batch_normalize=1 filters=640 size=1 stride=1 pad=1 activation=swish

Residual Block

[convolutional] batch_normalize=1 filters=640 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 filters=640 size=3 stride=1 pad=1 activation=swish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=640 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 filters=640 size=3 stride=1 pad=1 activation=swish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=640 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 filters=640 size=3 stride=1 pad=1 activation=swish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=640 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 filters=640 size=3 stride=1 pad=1 activation=swish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=640 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 filters=640 size=3 stride=1 pad=1 activation=swish

[shortcut] from=-3 activation=linear

Transition first

[convolutional] batch_normalize=1 filters=640 size=1 stride=1 pad=1 activation=swish

Merge [-1 -(3k+4)]

[route] layers = -1,-19

Transition last

116 (previous+7+3k)

[convolutional] batch_normalize=1 filters=1280 size=1 stride=1 pad=1 activation=swish

============ End of Backbone ============

============ Neck ============

CSPSPP

[convolutional] batch_normalize=1 filters=640 size=1 stride=1 pad=1 activation=swish

[route] layers = -2

[convolutional] batch_normalize=1 filters=640 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=640 activation=swish

[convolutional] batch_normalize=1 filters=640 size=1 stride=1 pad=1 activation=swish

SPP

[maxpool] stride=1 size=5

[route] layers=-2

[maxpool] stride=1 size=9

[route] layers=-4

[maxpool] stride=1 size=13

[route] layers=-1,-3,-5,-6

End SPP

[convolutional] batch_normalize=1 filters=640 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=640 activation=swish

[convolutional] batch_normalize=1 filters=640 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=640 activation=swish

[route] layers = -1, -15

133 (previous+6+5+2k)

[convolutional] batch_normalize=1 filters=640 size=1 stride=1 pad=1 activation=swish

End of CSPSPP

FPN-4

[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=swish

[upsample] stride=2

[route] layers = 94

[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=swish

[route] layers = -1, -3

[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=swish

Split

[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=swish

[route] layers = -2

Plain Block

[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=320 activation=swish

[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=320 activation=swish

[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=320 activation=swish

Merge [-1, -(2k+2)]

[route] layers = -1, -8

Transition last

149 (previous+6+4+2k)

[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=swish

FPN-3

[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=swish

[upsample] stride=2

[route] layers = 57

[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=swish

[route] layers = -1, -3

[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=swish

Split

[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=swish

[route] layers = -2

Plain Block

[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=160 activation=swish

[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=160 activation=swish

[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=160 activation=swish

Merge [-1, -(2k+2)]

[route] layers = -1, -8

Transition last

165 (previous+6+4+2k)

[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=swish

PAN-4

[convolutional] batch_normalize=1 size=3 stride=2 pad=1 filters=320 activation=swish

[route] layers = -1, 149

[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=swish

Split

[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=swish

[route] layers = -2

Plain Block

[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=320 activation=swish

[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=320 activation=swish

[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=320 activation=swish

[route] layers = -1,-8

Transition last

178 (previous+3+4+2k)

[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=swish

PAN-5

[convolutional] batch_normalize=1 size=3 stride=2 pad=1 filters=640 activation=swish

[route] layers = -1, 133

[convolutional] batch_normalize=1 filters=640 size=1 stride=1 pad=1 activation=swish

Split

[convolutional] batch_normalize=1 filters=640 size=1 stride=1 pad=1 activation=swish

[route] layers = -2

Plain Block

[convolutional] batch_normalize=1 filters=640 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=640 activation=swish

[convolutional] batch_normalize=1 filters=640 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=640 activation=swish

[convolutional] batch_normalize=1 filters=640 size=1 stride=1 pad=1 activation=swish

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=640 activation=swish

[route] layers = -1,-8

Transition last

191 (previous+3+4+2k)

[convolutional] batch_normalize=1 filters=640 size=1 stride=1 pad=1 activation=swish stopbackward=900

============ End of Neck ============

============ Head ============

YOLO-3

[route] layers = 165

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=320 activation=swish

[convolutional] size=1 stride=1 pad=1 filters=18 activation=logistic

[yolo] mask = 0,1,2 anchors = 55, 46, 48, 58, 62, 60, 58, 76, 81, 56, 73, 72, 92, 75, 76, 93, 106,103 classes=1 num=9 jitter=.1 scale_x_y = 2.0 objectness_smooth=1 ignore_thresh = .7 truth_thresh = 1

random=1

resize=1.5

iou_thresh=0.2

iou_normalizer=0.05 cls_normalizer=0.5 obj_normalizer=0.4 iou_loss=ciou nms_kind=diounms beta_nms=0.6 new_coords=1 max_delta=2

YOLO-4

[route] layers = 178

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=640 activation=swish

[convolutional] size=1 stride=1 pad=1 filters=18 activation=logistic

[yolo] mask = 3,4,5 anchors = 55, 46, 48, 58, 62, 60, 58, 76, 81, 56, 73, 72, 92, 75, 76, 93, 106,103 classes=1 num=9 jitter=.1 scale_x_y = 2.0 objectness_smooth=1 ignore_thresh = .7 truth_thresh = 1

random=1

resize=1.5

iou_thresh=0.2

iou_normalizer=0.05 cls_normalizer=0.5 obj_normalizer=0.4 iou_loss=ciou nms_kind=diounms beta_nms=0.6 new_coords=1 max_delta=2

YOLO-5

[route] layers = 191

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=1280 activation=swish

[convolutional] size=1 stride=1 pad=1 filters=18 activation=logistic

[yolo] mask = 6,7,8 anchors = 55, 46, 48, 58, 62, 60, 58, 76, 81, 56, 73, 72, 92, 75, 76, 93, 106,103 classes=1 num=9 jitter=.1 scale_x_y = 2.0 objectness_smooth=1 ignore_thresh = .7 truth_thresh = 1

random=1

resize=1.5

iou_thresh=0.2

iou_normalizer=0.05 cls_normalizer=0.5 obj_normalizer=0.4 iou_loss=ciou nms_kind=diounms beta_nms=0.6 new_coords=1 max_delta=2

I still get very low F1-score and mAP value... Can you suggest where I am wrong?

WilburZjh commented 2 years ago

oh, i misunderstand your problem. https://github.com/AlexeyAB/darknet/blob/master/src/detector.c#L995 this line is for threshold used to calculate map.

for f1-score, just append -thresh 0.25 after your command.

Hi @WongKinYiu , may I know if I can setup the height and width of the network differently? For example, the height is 896, and width is 640? If it is accepted, what else shall I change in the cfg file rather than the height and width?

WongKinYiu commented 2 years ago

just change width and height and make sure if you want to use letter_box or not in your cfg.

WilburZjh commented 2 years ago

just change width and height and make sure if you want to use letter_box or not in your cfg.

Hi, may I know where is the letter_box in my cfg?

WongKinYiu commented 2 years ago

https://github.com/AlexeyAB/darknet/blob/master/cfg/yolov4-csp.cfg#L27

WilburZjh commented 2 years ago

https://github.com/AlexeyAB/darknet/blob/master/cfg/yolov4-csp.cfg#L27

Thanks for the reply! May I know how it will influence the F1-score? if i use it, will the F1-score increase or not?

lsd1994 commented 2 years ago

@WilburZjh Just try it