Open zhangdan8962 opened 4 years ago
Did you mean methods like supervised contrastive learning, which utilizes the label information as the indicator to select negative samples.
No. What I meant is in addition of 65536 negative samples in queue, should we also consider other keys in the same batch(batch_size-1 ) as negative samples when we calculate the loss for a specific query.
But I guess it won't made a big difference because batch size is much smaller that queue size
Following SimCLR, MoCo v3 indeed uses the keys that naturally co-exist in the same batch and hence abandoned the memory queue, which they find has diminishing gain if the batch is sufficiently large (e.g., 4096).
According to EqCo, for MoCo v2, there is only a marginal improvement using K=65536 (67.5/+0.5) compared to K=256 (67.0) or K=1024 (67.1/+0.1). Top-1 accuracy (%) reported in brackets.
Let assume you use a batch size of 256, K=256, if you use the 255 additional keys of the current batch as negative samples, you can expect an improvement from +0.0 to +0.1 on linear classification on ImageNet... If you still want to make it, I believe you would need to be careful with the random batch shuffling for encoder_k
.
It is known that for a specific sample(query), its negative samples are all from dictionary(queue). I am wondering if adding other keys in the same batch as negative samples will make sense or not?
Thank you in advance for the help!