Question about batch size

Hi, I am impressed and interested in your outstanding work. However, I noticed a question about the batch size while experimenting with the code. When I increase the batch size to 128 or 256, the test accuracy decreases. I am unsure why this algorithm is affected by a larger batch size. Theoretically, the effect of co-guess, co-refine, and mixmatch operators is not related to the batch size. I have two possible reasons for this:

the learning rate is ought to increase proportionally with batch size
I traced back to the MixMatch implementation code, and found that there are slight differences with the MixMatch in DivideMix. Between line 253 to line 262, the original code do the interleave operation, but it is not implemented in DivideMix. The author of MixMatch also explain the meaning of this operation in this issue, which seems to be related to the BatchNorm.

Could you please give some explanations or insights on this question? I will also do some extra experiments on this topic, and will share the results soon. Thank you!

LiJunnan1992 / DivideMix

Question about batch size #53