batch size and subdivision

bubbliiiing / yolov4-pytorch

这是一个YoloV4-pytorch的源码，可以用于训练自己的模型。

MIT License

2.11k stars 611 forks source link

batch size and subdivision #190

Open QinghangHong1 opened 3 years ago

QinghangHong1 commented 3 years ago

Hello, thanks for your great work. I am trying to train yolov4 on coco2017 from scratch with your code. 1. I found that there is no "subdivision" in your code. Could you clarify why you do not use subdivision in this case, please?

I commented out your code on loading pretrain weight and turn off the block on freeze backbone. And I am using Tesla P100 with 16G memory. With image size 608, batch size greater than 16 will cause OOM. Is it normal? Thanks a lot for your help!

bubbliiiing commented 3 years ago

I don't know much about "subdivision". Can you explain it.
I don't have a graphics card larger than 16GB, but when the batch is 16, oom should be normal. After all, the input is 608

QinghangHong1 commented 3 years ago

Thanks for your reply. Subdivision is a mechanism to allow larger batch size with small GPU. For example, with batch size=128 and subdivision=4, the GPU processes 32(batch_size/subdivision) images per step, but actually does gradient backward every 4 steps(subdivision), which is basically equivalent to using batch size of 128 but you don't need such a big GPU.

I found this post explaining it. https://github.com/pjreddie/darknet/issues/224. Hope it is clear. Thanks.

bubbliiiing commented 3 years ago

I think this is a way of gradient accumulation. Gradient accumulation also requires a lot of memory. And this can not replace the role of a larger batch in batch normalization. Its function is to let more pictures do gradient descent together, and the descent direction is more accurate. It's a good idea, but I don't have time to add it