Training error: Use resnet18 as the backbone of yolov3. - Githubissues

AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )

http://pjreddie.com/darknet/

Other

21.65k stars 7.96k forks source link

Training error: Use resnet18 as the backbone of yolov3. #7124

Open ghm666 opened 3 years ago

ghm666 commented 3 years ago

error message : darknet: ./src/parser.c:984: parse_shortcut: Assertion `params.w == net.layers[index].out_w && params.h == net.layers[index].out_h' failed.

@AlexeyAB How can I solve this problem?

This error still occurs when I use resnet50.

./darknet detector train steel_task1/steel.data steel_task1/resnet18.cfg   -dont_show  -map
 CUDA-version: 9000 (10020), cuDNN: 7.4.1, CUDNN_HALF=1, GPU count: 1  
 CUDNN_HALF=1 
 OpenCV version: 4.9.1
 Prepare additional network for mAP calculation...
 0 : compute_capability = 600, cudnn_half = 0, GPU: Tesla P100-PCIE-16GB 
net.optimized_memory = 0 
mini_batch = 1, batch = 8, time_steps = 1, train = 0 
   layer   filters  size/strd(dil)      input                output
   0 conv     64       7 x 7/ 2    608 x 608 x   3 ->  304 x 304 x  64 1.739 BF
   1 max                2x 2/ 2    304 x 304 x  64 ->  152 x 152 x  64 0.006 BF
   2 conv     64       3 x 3/ 1    152 x 152 x  64 ->  152 x 152 x  64 1.703 BF
   3 conv     64       3 x 3/ 1    152 x 152 x  64 ->  152 x 152 x  64 1.703 BF
   4 Shortcut Layer: 1,  wt = 0, wn = 0, outputs: 152 x 152 x  64 0.001 BF
   5 conv     64       3 x 3/ 1    152 x 152 x  64 ->  152 x 152 x  64 1.703 BF
   6 conv     64       3 x 3/ 1    152 x 152 x  64 ->  152 x 152 x  64 1.703 BF
   7 Shortcut Layer: 4,  wt = 0, wn = 0, outputs: 152 x 152 x  64 0.001 BF
   8 conv    128       3 x 3/ 2    152 x 152 x  64 ->   76 x  76 x 128 0.852 BF
   9 conv    128       3 x 3/ 1     76 x  76 x 128 ->   76 x  76 x 128 1.703 BF
  10 Shortcut Layer: 7,  wt = 0, wn = 0, outputs:  76 x  76 x 128 0.001 BF
darknet: ./src/parser.c:984: parse_shortcut: Assertion `params.w == net.layers[index].out_w && params.h == net.layers[index].out_h' failed.
Aborted

resnet18-yolov3.cfg

` [convolutional] batch_normalize=1 filters=64 size=7 stride=2 pad=1 activation=leaky

[maxpool] size=2 stride=2

[convolutional] batch_normalize=1 filters=64 size=3 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=64 size=3 stride=1 pad=1 activation=linear

[shortcut] activation=leaky from=-3

[convolutional] batch_normalize=1 filters=64 size=3 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=64 size=3 stride=1 pad=1 activation=linear

[shortcut] activation=leaky from=-3

[convolutional] batch_normalize=1 filters=128 size=3 stride=2 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=128 size=3 stride=1 pad=1 activation=linear

[shortcut] activation=leaky from=-3

[convolutional] batch_normalize=1 filters=128 size=3 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=128 size=3 stride=1 pad=1 activation=linear

[shortcut] activation=leaky from=-3

[convolutional] batch_normalize=1 filters=256 size=3 stride=2 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=linear

[shortcut] activation=leaky from=-3

[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=linear

[shortcut] activation=leaky from=-3

[convolutional] batch_normalize=1 filters=512 size=3 stride=2 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=linear

[shortcut] activation=leaky from=-3

[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=linear

[shortcut] activation=leaky from=-3

[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=1024 activation=leaky

[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=1024 activation=leaky

[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=1024 activation=leaky

[convolutional] size=1 stride=1 pad=1 filters=27 activation=linear

[yolo] mask = 6,7,8 anchors = 18,27, 24,65, 67,47, 38,142, 92,112, 217, 69, 101,253, 245,161, 298,311 classes=4 num=9 jitter=.3 ignore_thresh = .7 truth_thresh = 1 random=1

[route] layers = -4

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky

[upsample] stride=2

[route] layers = -1, 19

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=512 activation=leaky

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=512 activation=leaky

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=512 activation=leaky

[convolutional] size=1 stride=1 pad=1 filters=27 activation=linear

[yolo] mask = 3,4,5 anchors = 18,27, 24,65, 67,47, 38,142, 92,112, 217, 69, 101,253, 245,161, 298,311 classes=4 num=9 jitter=.3 ignore_thresh = .7 truth_thresh = 1 random=1

[route] layers = -4

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky

[upsample] stride=2

[route] layers = -1, 13

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=256 activation=leaky

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=256 activation=leaky

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=256 activation=leaky

[convolutional] size=1 stride=1 pad=1 filters=27 activation=linear

[yolo] mask = 0,1,2 anchors = 18,27, 24,65, 67,47, 38,142, 92,112, 217, 69, 101,253, 245,161, 298,311 classes=4 num=9 jitter=.3 ignore_thresh = .7 truth_thresh = 1 random=1`

stephanecharette commented 3 years ago

error message : darknet: ./src/parser.c:984: parse_shortcut: Assertion `params.w == net.layers[index].out_w && params.h == net.layers[index].out_h' failed.

This error is normally caused when you don't update the filters=... line prior to the [yolo] layers.

But I see there are several issues open with similar problems:

issue #4949 "resnet50 question"
issue #7039 "Assertion failed: params.w == net.layers[index].out_w && params.h == net.layers[index].out_h"
issue #7098 "About the error that occurs when creating a ResNet152 model for Image Classification"
issue #6184 "do not work in resnet50.cfg"

These ones seems to have a possible solutions:

issue #6663 "No epoch when training ResNet50 Classification"
issue #6540 "how to change darknet53 backbone to ResNet for yolov3."

stephanecharette commented 3 years ago

Linking in more:

issue #3420 "Channels is not consistent when it is computed in shortcut layer in resnet50 cfg"
issue #5106 "resnet101 yoloV3 backbone shorcut layer assert error"

Also see this in issue #3363 about resnet152_trident: https://github.com/AlexeyAB/darknet/issues/3363#issuecomment-503980943

AlexeyAB commented 3 years ago

Ther is a bug with ResNet models (bug with residual connections between layers with different resolution). I will try to resolve it when I have time.

ghm666 commented 3 years ago

Ther is a bug with ResNet models (bug with residual connections between layers with different resolution). I will try to resolve it when I have time.

Thank you for your reply and look forward to fixing this bug soon.

manojps commented 1 year ago

@AlexeyAB Could you please share your thoughts on possible ways to resolve this bug? Someone else may take the lead on this.