Dropblock: A regularization method for convolutional networks +1.6 AP@0.5...0.95 (and +1.6 Top1)

AlexeyAB commented 4 years ago

Dropblock: A regularization method for convolutional networks +1.6 AP@0.5...0.95 (and +1.6 Top1):

paper: https://arxiv.org/abs/1810.12890v1
is used in conjunction with ASFF ASFF: https://github.com/AlexeyAB/darknet/issues/4382 paper: https://arxiv.org/abs/1911.09516v2
is used in conjunction with ASFF label-smoothing: https://github.com/AlexeyAB/darknet/issues/3272#issuecomment-497149618

Is implemented - use: https://github.com/AlexeyAB/darknet/commit/1df3ddc7d6a3efe9401948d3f527f432f3001476 and https://github.com/AlexeyAB/darknet/commit/642c065c0e7c681b90f10394edce9ce315aa60d8

[dropout]
dropblock=1
dropblock_size_abs=7  # block size 7x7
probability=0.1       # this is drop probability = (1 - keep_probability)

alternative way is to use relative block size (for using in Classifier or feature-extractor-backbone):

[dropout]
dropblock=1
dropblock_size=0.6  # 60% of width and height
probability=0.1     # this is drop probability = (1 - keep_probability)

AlexeyAB commented 4 years ago

Is implemented - use: https://github.com/AlexeyAB/darknet/commit/1df3ddc7d6a3efe9401948d3f527f432f3001476

[dropout]
dropblock=1
dropblock_size=0.6  # 60% of width and height
probability=0.1     # this is drop probability = (1 - keep_probability)

tuteming commented 4 years ago

please give an corresponding cfg file.

AlexeyAB commented 4 years ago

@WongKinYiu

I accelerated DropBlock on GPU.

Also,

for small mini_batch size, the batchnormalization does regularization, so DropBlock isn't required
for big mini_batch size, the batchnormalization doesn't do regularization, so DropBlock is required

So if Intra-Batch-Normalization (IBN part of CBN) will work well https://github.com/AlexeyAB/darknet/issues/4386#issuecomment-587981103 then we can increase mini_batch and accuracy IBN +~1-2% AP/Top1 by increasing batch= in cfg, and also we can use DropBlock +~1-2% AP/Top1 for increasing accuracy more, since for big mini_batch DropBlock is required.

https://medium.com/@ilango100/batch-normalization-speed-up-neural-network-training-245e39a62f85

Regularization by BatchNorm In addition to fastening up the learning of neural networks, BatchNorm also provides a weak form of regularization. How does it introduce Regularization? Regularization may be caused by introduction of noise to the data. Since the normalization is not performed on the whole dataset and just on the mini-batch, they act as noise. However BatchNorm provides only a weak regularization, it must not be fully relied upon to avoid over-fitting. Yet, other regularization could be reduced accordingly. For example, if dropout of 0.6 (drop rate) is to be given, with BatchNorm, you can reduce the drop rate to 0.4. BatchNorm provides regularization only when the batch size is small.

WongKinYiu commented 4 years ago

@AlexeyAB Thanks,

I have trained the model with dropblock, but it does not improve the accuracy. I followed the same strategy as what we have done in efficientnet - only add drop layers before shortcut layer. could you help for providing the better cfg with dropblock layers? or after i check the performance of cbn, i can modify my previous cfg with cbn if it works well.

AlexeyAB commented 4 years ago

@WongKinYiu

I have trained the model with dropblock, but it does not improve the accuracy.

May be because the mini_batch size was small.

The DropBlock can increase accuracy only if it is used with Batch-norm with large mini_batch (with IBN / CBN).

could you help for providing the better cfg with dropblock layers?

Attach your cfg-file with DropBlock.

or after i check the performance of cbn, i can modify my previous cfg with cbn if it works well.

I think we should check DropBlock+CBN after checking CBN.

WongKinYiu commented 4 years ago

@AlexeyAB

Our building is being disinfected today, will share the cfg after tomorrow.

AlexeyAB commented 4 years ago

@WongKinYiu I fixed dropblock.

Show cfg-files that you used for training

with DropBlock
with CBN

PS what is the result of ASFF?

WongKinYiu commented 4 years ago

@AlexeyAB Hello,

For CBN, I just replace all of batch_normalize=1 to batch_normalize=2 in csresnext50-gamma.cfg.

Will share cfg of dropblock after finish my breakfast.

ASFF can not converge, the loss become higher and higher after 100k epochs. Same situation occurs on ASFF+RFB after 250k epochs.

AlexeyAB commented 4 years ago

@WongKinYiu

ASFF can not converge, the loss become higher and higher after 100k epochs. Same situation occurs on ASFF+RFB after 250k epochs.

What value of avg loss? Share cfg-file.

This is strange that @Kyuuki93 trained ASSF successfully: https://github.com/AlexeyAB/darknet/issues/3874#issuecomment-561064425

WongKinYiu commented 4 years ago

if the epoch number is same as https://github.com/AlexeyAB/darknet/issues/3874#issuecomment-561064425, it does not have nan issue.

csresnext50-alpha.cfg.txt

AlexeyAB commented 4 years ago

@WongKinYiu

Same situation occurs on ASFF+RFB after 250k epochs

Do you mean that you got Nan at 250k iterations?
What mAP did you get at 100k and 200k iterations?
Attach cfg-file with ASFF+RFB

https://github.com/AlexeyAB/darknet/issues/4406#issuecomment-583789600

80k: 9.2 90k: 13.6 100k: 20.3 250k: Nan

Your DropBlock usage isn't the same as in the original paper: https://arxiv.org/abs/1810.12890v1

original paper: DropBlock is applied only in 2 groups 3 & 4, on conv and on residual-connections
your cfg-file: DropBlock is applied in 13 places before residual-connection

AlexeyAB commented 4 years ago

@WongKinYiu

could you help for providing the better cfg with dropblock layers?

Try to run training this cfg = CBN + DropBlock: csresnext50-gamma_dropblock_cbn.cfg.txt

Run training now, do not wait for the completion of the training of the CBN-model.

There are used:

[net]
batch=512
subdivisions=16
max_batches=300000

....
[convolutional]
batch_normalize=2
....
# for Group-3
[dropout]
dropblock=1
dropblock_size_abs=7
probability=0.025
....
# for Group-4
[dropout]
dropblock=1
dropblock_size_abs=7
probability=0.1

WongKinYiu commented 4 years ago

OK

AlexeyAB commented 4 years ago

@WongKinYiu I fixed gradient calculation for ASFF, so you can train activation=normalize_channels_softmax_maxval with the new code.

WongKinYiu commented 4 years ago

@AlexeyAB

Thanks, my previous modified version is training about 110k epochs, if it still get nan, i ll use new code to retrain.

AlexeyAB commented 4 years ago

@WongKinYiu Hi,

Have you restarted training models: ASFF, BiFPN csdarknet53-bifpn-optimal.cfg.txt and csresnext50-bifpn-optimal.cfg.txt, weighted-shortcut csresnext50-ws-mi2.cfg.txt and csresnext50-ws.cfg.txt after this commit https://github.com/AlexeyAB/darknet/commit/f6baa62c9b6151b9f615a1e56434d237553fd4af Feb 24, 2020?

Also it seems that iou_thresh=0.213 degrades accuracy: https://github.com/WongKinYiu/CrossStagePartialNetworks/blob/master/coco/results.md#mscoco

While scale_x_y=1.05/1.20 decreases AP50 & AP75, but keeps the same AP@0.5...0.95. It seems that it increases AP95. Can you check AP95 for baseline model and scale_x_y model? Or better show whole accuracy output from evaluation server for both models.

WongKinYiu commented 4 years ago

@AlexeyAB Hello,

Yes, I restart training bifpn models, but i use leaky instead linear activation function. as i remember, csdarknet53-bifpn-optimal.cfg.txt and csresnext50-bifpn-optimal.cfg.txt using Feb 21, 2020's repo. csresnext50-ws-mi2.cfg.txt and csresnext50-ws.cfg.txt using Feb 18, 2020's repo.

Currently, i can only make sure genetic , mosaic, and ciou loss benefit ap on coco. i ll train model with genetic , mosaic, and ciou loss after get free gpu.

the results are obtained by test-dev set on codalab, so i can not get ap95. i can check ap95 on val set next Monday.

AlexeyAB commented 4 years ago

@WongKinYiu

as i remember, i use Feb 21, 2020's repo.

If you use Feb 21, please update code to Feb 24 and restart training, there was fixed ASFF, BiFPN, DropBlock: https://github.com/AlexeyAB/darknet/issues/4662#issuecomment-590438709

Yes, I restart training bifpn models,

And weighted-shortcut.

i can check ap95 on val set next Monday.

Ok, please check Ap50, 75, 95 and AP50...95 on both models.

WongKinYiu commented 4 years ago

@AlexeyAB

Do i need restart from the first epoch, or i can continue training from current epochs?

AlexeyAB commented 4 years ago

@WongKinYiu You need to restart from the first epoch:

ASFF
BiFPN: csdarknet53-bifpn-optimal.cfg.txt and csresnext50-bifpn-optimal.cfg.txt
weighted-shortcut: csresnext50-ws-mi2.cfg.txt and csresnext50-ws.cfg.txt

WongKinYiu commented 4 years ago

@AlexeyAB Thanks.

WongKinYiu commented 4 years ago

@AlexeyAB

BiFPN: restart csdarknet53-bifpn-optimal.cfg.txt and csresnext50-bifpn-optimal.cfg.txt
weighted-shortcut: restart csresnext50-ws-mi2.cfg.txt and csresnext50-ws.cfg.txt and csdarknet53-ws.cfg.txt
DropBlock: restart csresnext50-gamma_dropblock_cbn.cfg.txt
ASFF: stop training csresnext50-asff-rfbn.cfg, wait for free gpu.

AlexeyAB commented 4 years ago

@WongKinYiu Ok!

WongKinYiu commented 4 years ago

csresnext50-gamma_dropblock_cbn.cfg.txt 47.3 top-1, 72.4 top-5.

AlexeyAB commented 4 years ago

@WongKinYiu CBN or both DropBlock and CBN seem to be working poorly.

arnaud-nt2i commented 3 years ago

@WongKinYiu @AlexeyAB I'm curious if you manage to check the performance of cbn or cbn + DropBlock properly? Not having to worry about mini-batch size could be great !

akashAD98 commented 2 years ago

https://github.com/AlexeyAB/darknet/issues/8382 the dropout is working fine? where should I add this in our yolov4-mish. cfg? @AlexeyAB @WongKinYiu

AlexeyAB / darknet

Dropblock: A regularization method for convolutional networks +1.6 AP@0.5...0.95 (and +1.6 Top1) #4498