AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.6k stars 7.95k forks source link

resnet+yolo3head where should I put the route layer #2005

Open springshuai opened 5 years ago

springshuai commented 5 years ago

Hi AlexeyAB,

Thanks for you repo. I managed to run the yolo3-voc but since the darknet is quite large so I am trying to replace it by ResNet50. My question is how should I write the route layers? E.g. in yolo3 I found [route] layers = -1, 61 and [route] layers = -1, 36. And in darknet53.cfg I found that the 36th (and the 61th respectively) layer --- if my counting is correct --- is right before the 11th (and the 19th respectively) [shortcut] layer. I am confused cause I thought the concatenate layer is attached AFTER the shortcuts, no? If I want to connect the yolo3 head on a ResNet50 backbone, what are the layer numbers for the [route] layers?

Thanks a lot.

AlexeyAB commented 5 years ago

@springshuai Hi,

I thought the concatenate layer is attached AFTER the shortcuts

What do you mean?

You should connect to the layers 53 and 29, because these [shortcut] layers have size 2x (16x16) and 4x (32x32) more than the final size (8x8) respectively, i.e. they are located before subsampling (stride=2) layers. ResNet50: image

springshuai commented 5 years ago

Thanks a lot. I have another question about resnet50. The resnet50.cfg in this repo do the subsampling in those 3x3 convolutional layers (such as layer 31 and 55 in the picture you've posted above): [convolutional] batch_normalize=1 filters=128 size=3 stride=2 pad=1 activation=leaky [convolutional] batch_normalize=1 filters=256 size=3 stride=2 pad=1 activation=leaky [convolutional] batch_normalize=1 filters=512 size=3 stride=2 pad=1 activation=leaky

BUT, the network structure (http://ethereon.github.io/netscope/#/gist/db945b393d40bfa26006) I found here (https://github.com/KaimingHe/deep-residual-networks) do the subsampling in those 1x1 convolutional layers (e.g. layer 30 and 54). So my question is how can I use the pertained weights of ResNet50 for my ResNet50+YoloV3 model? Thanks in advance.

AlexeyAB commented 5 years ago

@springshuai

BUT, the network structure (http://ethereon.github.io/netscope/#/gist/db945b393d40bfa26006) I found here (https://github.com/KaimingHe/deep-residual-networks) do the subsampling in those 1x1 convolutional layers (e.g. layer 30 and 54).

It doesn't matter. Current versions of Resnet 50 has good accuracy 75.8 Top1 | 92.9 Top5: https://pjreddie.com/darknet/imagenet/

So just copy structure from: https://github.com/AlexeyAB/darknet/blob/master/cfg/resnet50.cfg


So my question is how can I use the pertained weights of ResNet50 for my ResNet50+YoloV3 model?

springshuai commented 5 years ago

Huuuge thanks to you. Since the training went well :-), here I have more questions awaiting:

1-The mAP of my ResNet50(with pertained weights and input resolution is 416x416) + yoloV3 on coco validation is around 0.43 after 60 epochs. It is obviously way behind the original DarkNet53-based one, which is 0.55 according to the author. I wasn't expecting this since ResNet50's Top-1 accuracy on ImageNet is only 1.4 percents behind DarkNet53. Do you think, from your experience, such a performance gap is normal?

2-The yoloV3 head is pretty heavy, e.g., there is a 3x3 convolution between featuremap_h x featuremap_w x 512 and featuremap_h x featuremap_w x 1024, thus 3x3x512x1024 parameters. So I am thinking of using depth-wise convolution to reduce the size. Do you think this will bring a significant performance drop? And it this modification possible in the .cfg file?

Thanks in advance.

AlexeyAB commented 5 years ago

@springshuai

  1. There is no big difference between Darknet53 and ResNet50 for a simple taks such as Classification, but there is a huge difference for a harder task - Detection. DarkNet53 is much more better.

  2. Yes, may be we should use Depthwise (or Separable, or Group) convolution (don't mix it up with Grouped convolution): https://ikhlestov.github.io/pages/machine-learning/convolutions-types/ but it isn't supported yet by this repo: https://github.com/AlexeyAB/darknet/issues/1730#issuecomment-430603989

Also I think to improve XNOR-net for Tensor Cores, so it will reduce size of parameters /32 times and improve speed up to 2 Peta-Ops on GPU GeForce RTX 2080Ti (f.e. 14 Peta-flops is a top-10 supercomputer in the world top500.org): https://github.com/NVIDIA/cutlass/issues/34

aftaufik commented 5 years ago

Hi @AlexeyAB & @springshuai

@springshuai

BUT, the network structure (http://ethereon.github.io/netscope/#/gist/db945b393d40bfa26006) I found here (https://github.com/KaimingHe/deep-residual-networks) do the subsampling in those 1x1 convolutional layers (e.g. layer 30 and 54).

It doesn't matter. Current versions of Resnet 50 has good accuracy 75.8 Top1 | 92.9 Top5: https://pjreddie.com/darknet/imagenet/

So just copy structure from: https://github.com/AlexeyAB/darknet/blob/master/cfg/resnet50.cfg

So my question is how can I use the pertained weights of ResNet50 for my ResNet50+YoloV3 model?

  • Download this weights: https://pjreddie.com/media/files/resnet50.weights
  • Do this command:

    [darknet/build/darknet/x64/partial.cmd](https://github.com/AlexeyAB/darknet/blob/21a4ec9390b61c0baa7ef72e72e59fa143daba4c/build/darknet/x64/partial.cmd#L36)
    
       Line 36
    in
    [21a4ec9](/AlexeyAB/darknet/commit/21a4ec9390b61c0baa7ef72e72e59fa143daba4c)
    
         darknet.exe partial cfg/resnet50.cfg resnet50.weights resnet50.65 65 

    ./darknet partial cfg/resnet50.cfg resnet50.weights resnet50.65 65

  • And run training using resnet50.65 pre-trained file: ./darknet detector train data/obj.data **yolov3-resnet50.cfg** resnet50.65

for resnet50+yolov3 cfg file, just copy the structure resnet50.cfg, remove 4 last layer, and add yolov3 layer? or do some adjustment for resnet50.cfg file?

Big thanks

springshuai commented 5 years ago

@aftaufik Hi, in my case, I copied the resnet50.cfg, replaced its classification head by YOLOV3 detection head (make sure the route layers are correctly adapted), changed the learning rate and left the rest part of resnet50.cfg untouched.

aftaufik commented 5 years ago

Thank you for the reply @springshuai , i followed some of @AlexeyAB instruction. and i get resnet50.65 weights. i think the 65 number is represent the last layer of pretrained weight (CMIIW)?

62 conv 512 1 x 1 / 1 13 x 13 x2048 -> 13 x 13 x 512 0.354 BF 63 conv 512 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x 512 0.797 BF 64 conv 2048 1 x 1 / 1 13 x 13 x 512 -> 13 x 13 x2048 0.354 BF

which is :

[convolutional] batch_normalize=1 filters=2048 size=1 stride=1 pad=1 activation=linear

[shortcut] from=-4 activation=leaky

and i rewrote the cfg file, resnet50_yolov3.cfg, i did remove the shortcut layer, and directly put all of yolov3 layer. the output look like this (with the error):

layer filters size input output 0 conv 64 7 x 7 / 2 416 x 416 x 3 -> 208 x 208 x 64 0.814 BF 1 max 2 x 2 / 2 208 x 208 x 64 -> 104 x 104 x 64 0.003 BF 2 conv 64 1 x 1 / 1 104 x 104 x 64 -> 104 x 104 x 64 0.089 BF 3 conv 64 3 x 3 / 1 104 x 104 x 64 -> 104 x 104 x 64 0.797 BF 4 conv 256 1 x 1 / 1 104 x 104 x 64 -> 104 x 104 x 256 0.354 BF 5 Shortcut Layer: 1 6 conv 64 1 x 1 / 1 104 x 104 x 256 -> 104 x 104 x 64 0.354 BF 7 conv 64 3 x 3 / 1 104 x 104 x 64 -> 104 x 104 x 64 0.797 BF 8 conv 256 1 x 1 / 1 104 x 104 x 64 -> 104 x 104 x 256 0.354 BF 9 Shortcut Layer: 5 10 conv 64 1 x 1 / 1 104 x 104 x 256 -> 104 x 104 x 64 0.354 BF 11 conv 64 3 x 3 / 1 104 x 104 x 64 -> 104 x 104 x 64 0.797 BF 12 conv 256 1 x 1 / 1 104 x 104 x 64 -> 104 x 104 x 256 0.354 BF 13 Shortcut Layer: 9 14 conv 128 1 x 1 / 1 104 x 104 x 256 -> 104 x 104 x 128 0.709 BF 15 conv 128 3 x 3 / 2 104 x 104 x 128 -> 52 x 52 x 128 0.797 BF 16 conv 512 1 x 1 / 1 52 x 52 x 128 -> 52 x 52 x 512 0.354 BF 17 Shortcut Layer: 13 18 conv 128 1 x 1 / 1 52 x 52 x 512 -> 52 x 52 x 128 0.354 BF 19 conv 128 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 128 0.797 BF 20 conv 512 1 x 1 / 1 52 x 52 x 128 -> 52 x 52 x 512 0.354 BF 21 Shortcut Layer: 17 22 conv 128 1 x 1 / 1 52 x 52 x 512 -> 52 x 52 x 128 0.354 BF 23 conv 128 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 128 0.797 BF 24 conv 512 1 x 1 / 1 52 x 52 x 128 -> 52 x 52 x 512 0.354 BF 25 Shortcut Layer: 21 26 conv 128 1 x 1 / 1 52 x 52 x 512 -> 52 x 52 x 128 0.354 BF 27 conv 128 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 128 0.797 BF 28 conv 512 1 x 1 / 1 52 x 52 x 128 -> 52 x 52 x 512 0.354 BF 29 Shortcut Layer: 25 30 conv 256 1 x 1 / 1 52 x 52 x 512 -> 52 x 52 x 256 0.709 BF 31 conv 256 3 x 3 / 2 52 x 52 x 256 -> 26 x 26 x 256 0.797 BF 32 conv 1024 1 x 1 / 1 26 x 26 x 256 -> 26 x 26 x1024 0.354 BF 33 Shortcut Layer: 29 34 conv 256 1 x 1 / 1 26 x 26 x1024 -> 26 x 26 x 256 0.354 BF 35 conv 256 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 256 0.797 BF 36 conv 1024 1 x 1 / 1 26 x 26 x 256 -> 26 x 26 x1024 0.354 BF 37 Shortcut Layer: 33 38 conv 256 1 x 1 / 1 26 x 26 x1024 -> 26 x 26 x 256 0.354 BF 39 conv 256 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 256 0.797 BF 40 conv 1024 1 x 1 / 1 26 x 26 x 256 -> 26 x 26 x1024 0.354 BF 41 Shortcut Layer: 37 42 conv 256 1 x 1 / 1 26 x 26 x1024 -> 26 x 26 x 256 0.354 BF 43 conv 256 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 256 0.797 BF 44 conv 1024 1 x 1 / 1 26 x 26 x 256 -> 26 x 26 x1024 0.354 BF 45 Shortcut Layer: 41 46 conv 256 1 x 1 / 1 26 x 26 x1024 -> 26 x 26 x 256 0.354 BF 47 conv 256 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 256 0.797 BF 48 conv 1024 1 x 1 / 1 26 x 26 x 256 -> 26 x 26 x1024 0.354 BF 49 Shortcut Layer: 45 50 conv 256 1 x 1 / 1 26 x 26 x1024 -> 26 x 26 x 256 0.354 BF 51 conv 256 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 256 0.797 BF 52 conv 1024 1 x 1 / 1 26 x 26 x 256 -> 26 x 26 x1024 0.354 BF 53 Shortcut Layer: 49 54 conv 512 1 x 1 / 1 26 x 26 x1024 -> 26 x 26 x 512 0.709 BF 55 conv 512 3 x 3 / 2 26 x 26 x 512 -> 13 x 13 x 512 0.797 BF 56 conv 2048 1 x 1 / 1 13 x 13 x 512 -> 13 x 13 x2048 0.354 BF 57 Shortcut Layer: 53 58 conv 512 1 x 1 / 1 13 x 13 x2048 -> 13 x 13 x 512 0.354 BF 59 conv 512 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x 512 0.797 BF 60 conv 2048 1 x 1 / 1 13 x 13 x 512 -> 13 x 13 x2048 0.354 BF 61 Shortcut Layer: 57 62 conv 512 1 x 1 / 1 13 x 13 x2048 -> 13 x 13 x 512 0.354 BF 63 conv 512 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x 512 0.797 BF 64 conv 2048 1 x 1 / 1 13 x 13 x 512 -> 13 x 13 x2048 0.354 BF 65 conv 512 1 x 1 / 1 13 x 13 x2048 -> 13 x 13 x 512 0.354 BF 66 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF 67 conv 512 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF 68 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF 69 conv 512 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF 70 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF 71 conv 24 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 24 0.008 BF 72 yolo 73 route 69 74 conv 256 1 x 1 / 1 13 x 13 x 512 -> 13 x 13 x 256 0.044 BF 75 upsample 2x 13 x 13 x 256 -> 26 x 26 x 256 76 route 75 61 77 Layer before convolutional layer must output image.: Cannot allocate memory darknet: ./src/utils.c:277: error: Assertion `0' failed. Aborted (core dumped)

im not yet adapted the route layer (im still trying to learn this stuff). currently im doing some research for route layer to adapting this new configuration resnet50backbone and yolov3 detection layer.

because on this cfg file, i put all conv layers before yolo detection layer.

thanks before, im really appreciate it. sorry for my bad english :(