HobbitLong / CMC

[arXiv 2019] "Contrastive Multiview Coding", also contains implementations for MoCo and InstDis
BSD 2-Clause "Simplified" License
1.3k stars 179 forks source link

Question about NCECriterion. py #43

Open Abyssaledge opened 4 years ago

Abyssaledge commented 4 years ago

https://github.com/HobbitLong/CMC/blob/58d06e9a82f7fea2e4af0a251726e9c6bf67c7c9/NCE/NCECriterion.py#L30

I think the purpose of using NCE is to avoid expensive summation over entire vector in softmax. But in your implementation, there is still summation over entire log_D0 which confused me. I'll appreciate it if you explain this. I'm new to this field, and hope you point out my misunderstanding if there is.

HobbitLong commented 4 years ago

@Abyssaledge , first the summation is outside the log, I guess you mean for softmax-ce, the summation is inside the log?

For the summation, see the m in Eq 11. in the paper

Abyssaledge commented 4 years ago

@HobbitLong Thanks for your reply. I didn't make my point clearly. What confused me is that why your implementation is more efficient than infoNCE in CPC? I believe NCE tries to avoid summing all the negative samples by using Monte Carlo, because summation over so many negative samples is expensive. But your code still sums up all the negative samples:log_D0.view(-1, 1).sum(0), where log_D0 contains "similarity" with all the negatives smaples. 我的理解是NCE通过避免对所有负样本求和来降低计算量,但是log_D0这个tensor中元素的数量和负样本数一样,那对它求和不是就不能起到降级计算量的目的了吗?是不是我对NCE或者是计算量的理解有误?刚入坑CV,见笑了。。。

Abyssaledge commented 4 years ago

@HobbitLong I seem to figure out why I'm confused. The k and N in Eq 1. and Eq 9. respectively in your paper mean the size of memory bank? Does the m in Eq.11 mean the number of noise samples sampled from memory bank? I might mixed them up.