antoine77340 / MIL-NCE_HowTo100M

PyTorch GPU distributed training code for MIL-NCE HowTo100M
Apache License 2.0
214 stars 31 forks source link

Minor mistake in MILNCELoss #4

Closed ycxioooong closed 4 years ago

ycxioooong commented 4 years ago

Hi there,

I'm interested in your paper and the proposed MIL-NCE Loss. So I read the code about this loss. However, I found a mistake in your loss function.

The original design in your paper should be loss = log(pos/(pos+neg)). But in the code implementation, you might mistakenly design it as loss = log(pos/(pos+pos+neg)), leading the minimum of this loss to be 0.6931. Luckily, this would have little impact when batch size is large.

You could check the mistake by this code snippet:

import MILNCELoss

loss_fn = MILNCELoss()
video = torch.Tensor([[100,0,0,0,0,0]]).cuda()
>>> tensor([[100.,   0.,   0.,   0.,   0.,   0.]], device='cuda:0')
text = torch.Tensor([[100,0,0,0,0,0],[100,0,0,0,0,0],[100,0,0,0,0,0]]).cuda()
>>> tensor([[100.,   0.,   0.,   0.,   0.,   0.],
        [100.,   0.,   0.,   0.,   0.,   0.],
        [100.,   0.,   0.,   0.,   0.,   0.]], device='cuda:0')
loss_fn(video, text)
>>> tensor(0.6934, device='cuda:0')
antoine77340 commented 4 years ago

Hi,

This was done intentionally as it is much easier to implement this loss than the real MIL-NCE loss. The gradient with respect to this implemented loss is similar to the theoretical MIL-NCE so it should not have any problem at training. in fact: the loss implemented is: log(pos/(2 * pos + neg)) = log(pos/(pos + neg/2)) - log(2). The gradient of this loss is exactly equal to the gradient of the MIL-NCE loss but with twice less negatives so that's fine :).

ycxioooong commented 4 years ago

Yes, that makes sense, this implementation is easier than the original one. Thanks for your explanation :)