Open QinghangHong1 opened 3 years ago
Thanks for your reply. Subdivision is a mechanism to allow larger batch size with small GPU. For example, with batch size=128 and subdivision=4, the GPU processes 32(batch_size/subdivision) images per step, but actually does gradient backward every 4 steps(subdivision), which is basically equivalent to using batch size of 128 but you don't need such a big GPU.
I found this post explaining it. https://github.com/pjreddie/darknet/issues/224. Hope it is clear. Thanks.
I think this is a way of gradient accumulation. Gradient accumulation also requires a lot of memory. And this can not replace the role of a larger batch in batch normalization. Its function is to let more pictures do gradient descent together, and the descent direction is more accurate. It's a good idea, but I don't have time to add it
Hello, thanks for your great work. I am trying to train yolov4 on coco2017 from scratch with your code. 1. I found that there is no "subdivision" in your code. Could you clarify why you do not use subdivision in this case, please?