Closed ycxioooong closed 4 years ago
Hi,
This was done intentionally as it is much easier to implement this loss than the real MIL-NCE loss. The gradient with respect to this implemented loss is similar to the theoretical MIL-NCE so it should not have any problem at training. in fact: the loss implemented is: log(pos/(2 * pos + neg)) = log(pos/(pos + neg/2)) - log(2). The gradient of this loss is exactly equal to the gradient of the MIL-NCE loss but with twice less negatives so that's fine :).
Yes, that makes sense, this implementation is easier than the original one. Thanks for your explanation :)
Hi there,
I'm interested in your paper and the proposed MIL-NCE Loss. So I read the code about this loss. However, I found a mistake in your loss function.
The original design in your paper should be
loss = log(pos/(pos+neg))
. But in the code implementation, you might mistakenly design it asloss = log(pos/(pos+pos+neg))
, leading the minimum of this loss to be 0.6931. Luckily, this would have little impact when batch size is large.You could check the mistake by this code snippet: