Closed atinghosh closed 4 years ago
Why do we need this method to get the correct BatchNorm? I m confused too. Any insights will be very helpful
@berzentine I should have closed this issue long back, I retract what I mentioned earlier, actually without the interleave you won't get high accuracy (keeping everything else unchanged in this implementation).
Interleave makes sure batch norm parameters are calculated correctly for the very small labelled dataset, each time the forward propagation is done BatchNorm (BN) keeps track of the mean and standard deviation of minibatch of the activations, and later at the time of prediction/inference these stored means and standard deviations (probably some exponentially moving averaged version of them) are used. Now, if, one keeps forwarding the small labelled dataset through NN, again and again, these BN parameters get biased. Interleave method avoids that.
One can definitely avoid this by passing both labelled and unlabeled data together (after concatenating) in a single forward pass, but then due to larger batch size learning rate needs to be adjusted.
I am closing this issue.
Thanks for your code. I was trying to implement mixmatch independently but has been unable to achieve accuracy as mentioned in the paper.
I omitted that part and I noticed no difference.
Can you please elaborate on what this method does. Thanks a lot.