antoine77340 / MIL-NCE_HowTo100M

PyTorch GPU distributed training code for MIL-NCE HowTo100M
Apache License 2.0
214 stars 31 forks source link

About the MILNCELoss #3

Closed tqvinhcs closed 4 years ago

tqvinhcs commented 4 years ago

Hi,

1) What is the input for the MILNCELoss function? Is it like this: video_embd: batch x D text_embd: batch x D where D=512?

2) Why do you cat the x and x transpose here? denominator = th.cat((x, x.permute(1,0,2)), dim=1).view(x.shape[0], -1) isn't that th.logsumexp(x, dim=1) already computed the log sum in the denominator?

Thanks

antoine77340 commented 4 years ago
  1. The input shape are: video_embd: batch x D text_embd: (batch * number_of_positive_candidates) x D where D=512 number_of_positive_candidate = 5

  2. The transpose is to have negative for both the video and the text. If you do not concatenate the transpose you will only get either the negative for the video OR the negative for the text.

tqvinhcs commented 4 years ago

I see. Thank a lot for your clarification.