Closed YangJae96 closed 2 years ago
Hi, I have some question about the implementation.
Why do you shuffle and reverse the mini-batch order before feeding it into the online encoder and momentum encoder like the above code??
This is a technique introduced in Kaiming's MoCo paper Page 4. This technique helps to stabilize batch normalization training on multiple GPUs.
Hi, I have some question about the implementation.
Why do you shuffle and reverse the mini-batch order before feeding it into the online encoder and momentum encoder like the above code??