Open KaimingHe opened 4 years ago
Hi,
Thanks for your comment! Which specific result are you referring to? Or are you suggesting that an EMA of Z could potentially improve all InsDis, MoCo and CMC with NCE loss?
You reported a low number of MoCo with the NCE loss. This is because your implementation of NCE is problematic and correcting it should gives a more reasonable MoCo w/ NCE number.
@KaimingHe , yeah, probably the current NCE implementation is less suitable for MoCo, and I am happy to rectify it. What is the best momentum multiplier for updating Z you would like to suggest?
0.99 for updating Z works well. In ImageNet-1K, MoCo with NCE is ~2% worse than MoCo with InfoNCE, similar to the case of the memory bank counterpart.
Thanks for your input! I have temporarily removed the NCE numbers in README to avoid any confusion, and will keep them vacant until I get a chance to look into it.
Is it necessary to fix or EMA-update Z
? Maybe it is unstable if we always compute Z = out.mean() * self.outputSize
every time? Also, I couldn't find any statement about this approximation of Z
in the paper, or maybe I missed it. Could you designate a reference point of this?
Later I found the statement in InsDis: "Empirically, we find the approximation derived from initial batches sufficient to work well in practice."
https://github.com/HobbitLong/CMC/blob/0f72b18a99e35bf2c2f0001656c2b33365b50cf6/NCE/NCEAverage.py#L189
This one-time estimation is problematic, especially if the dictionary is not random noise. Computing Z as a moving average of this would give a more reasonable result.