facebookresearch / moco

PyTorch implementation of MoCo: https://arxiv.org/abs/1911.05722
MIT License
4.8k stars 792 forks source link

Negative samples #62

Open zhangdan8962 opened 4 years ago

zhangdan8962 commented 4 years ago

It is known that for a specific sample(query), its negative samples are all from dictionary(queue). I am wondering if adding other keys in the same batch as negative samples will make sense or not?

Thank you in advance for the help!

bigheiniu commented 3 years ago

Did you mean methods like supervised contrastive learning, which utilizes the label information as the indicator to select negative samples.

zhangdan8962 commented 3 years ago

No. What I meant is in addition of 65536 negative samples in queue, should we also consider other keys in the same batch(batch_size-1 ) as negative samples when we calculate the loss for a specific query.

But I guess it won't made a big difference because batch size is much smaller that queue size

howard-mahe commented 3 years ago

Following SimCLR, MoCo v3 indeed uses the keys that naturally co-exist in the same batch and hence abandoned the memory queue, which they find has diminishing gain if the batch is sufficiently large (e.g., 4096).

According to EqCo, for MoCo v2, there is only a marginal improvement using K=65536 (67.5/+0.5) compared to K=256 (67.0) or K=1024 (67.1/+0.1). Top-1 accuracy (%) reported in brackets.

Screenshot from 2021-04-27 14-36-48

Let assume you use a batch size of 256, K=256, if you use the 255 additional keys of the current batch as negative samples, you can expect an improvement from +0.0 to +0.1 on linear classification on ImageNet... If you still want to make it, I believe you would need to be careful with the random batch shuffling for encoder_k.