Closed HaoKun-Li closed 4 years ago
Hi, @HaoKun-Li ,
The short answer is that, these two can be viewed as two constants to scale up the dynamic range of the score function.
Typically, NCE can deal with unnormalized distribution and will automatically adjust the score range.
In this specific case, the score is produced by the inner product of two l2-normalized vector, which means it's range is [-1, 1]. This range might not be enough for NCE's score adjustment. So here Z_v1
and Z_v2
are very simple monte carlo estimation of the partition function of full softmax (see sec 2.4 in this paper and sec 3.4 in this paper) to help adjust the score range a bit.
Thanks for your reply!
@HaoKun-Li , you are welcomed. I just closed it, but feel free to reopen it if you would like to discuss more.
Thanks for your great work and great code!
When I read your code of class "ContrastMemory" in "memory.py", I can not find the related introduction about the use of "Z_v1" and "Z_v2" in your arXiv preprint paper. I want to know why the "out_v1" should divide "Z_v1"? If the "outputSize" is big, then the "out_v1“ may be very small. And the "outputSize" is very different between datasets, will it influence the value of "out_v1" too much, and even influence the performance of the student network?
Looking forward to your reply. @HobbitLong