about batch size - Githubissues

dvlab-research / PanopticFCN

Fully Convolutional Networks for Panoptic Segmentation (CVPR2021 Oral)

Apache License 2.0

391 stars 53 forks source link

about batch size #18

Closed emjay73 closed 3 years ago

emjay73 commented 3 years ago

I realized that provided config files set IMGS_PER_BATCH as 16. Did you assign 16 to IMGS_PER_BATCH when you train with 8 GPUs? Handling 2 images per GPU? Or did you multiplied 16 by 8 and LR by 8 and iter by /8? Are there any other hyperparameters that I have to change whenever I'd like to change a batch size or number of GPUs to reproduce the results?

yanwei-li commented 3 years ago

Hi, IMGS_PER_BATCH=16 with 8 GPUS denotes 2 images/GPU. If you'd like to change the batch size, consider changing the LR and Iter linearly. For example, if IMGS_PER_BATCH is increased to 32, consider increasing LR to 2*LR while decreasing MAX_ITER to 1/2 simultaneously. This ensures the total optimized epoch to be identical.

emjay73 commented 3 years ago

May I ask why? Since you are using V100, I believe that you have pretty much room for a bigger batchsize, especially for the case like R-50-400 setting.

yanwei-li commented 3 years ago

Hi, because we want people who do not have V100 can still achieve a similar result in 1080Ti or 2080 Ti. So, we kept the most common setting, which is identical to other works.

emjay73 commented 3 years ago

Oooh, that's really sweet of you! Thank you for your consideration. Actually, I'm using 2080ti and your comment somehow cheers me up. Don't you think there could be a possibility that a small batch size hurts network performance because of an unstable statistic mean?

yanwei-li commented 3 years ago

Yes, actually it could be. But our goal is not to set up the best performance with a larger batch size. We'd like to find a better method for panoptic representation. So, we compare with previous works under the same setting to prove it. Of course, if you'd like to achieve better performance. Some other skills, like large batch size and more data augmentations, are recommended. BTW, for your reference, SyncBN for the backbone won't bring much improvement in our experiments.

emjay73 commented 3 years ago

That's interesting. Thank you for sharing your experience of using SyncBN! To sum up, you used the same config uploaded on Github to obtain the experimental results on the paper, not only yours but also others too. Am I right?

yanwei-li commented 3 years ago

Yes, you are right. It's a common setting.

emjay73 commented 3 years ago

OK. Thank you. Have a good day!