bpiyush / TestOfTime

Official code for our CVPR 2023 paper: Test of Time: Instilling Video-Language Models with a Sense of Time
MIT License
45 stars 3 forks source link

Thank you for your work, could you teach me some code problems? #5

Closed qyr0403 closed 10 months ago

qyr0403 commented 10 months ago

In https://github.com/bpiyush/TestOfTime/blob/main/package/losses/weighted_contrastive.py Line 38 and 39, I can't understand why do it perform "logits[batch_size // 2:, :batch_size // 2] += self.alpha":

` def call(self, pooled_video, pooled_text, **kargs): batch_size = pooled_video.size(0)

change device of the weight

    self.loss.weight = self.loss.weight.to(pooled_video.device)
    self.alpha = self.alpha.to(pooled_video.device)
    logits = torch.mm(pooled_text, pooled_video.transpose(1, 0))
    logits[batch_size // 2:, :batch_size // 2] += self.alpha
    logits[:batch_size // 2, batch_size // 2:] += self.alpha
    targets = torch.arange(
        batch_size,
        dtype=torch.long,
        device=pooled_video.device)
    return self.loss(logits, targets)`

I know 'self.alpha' is parameter matrix about TNCE, but why do not multiply but add? logits represent cosine similarity matrix about video and text, it should range from -1 to 1.

bpiyush commented 10 months ago

This is just an implementation trick. If you expand the loss term equation, you want something like this:

image

So you want the alpha multiplier to happen on the exponent. But the CrossEntropy loss module does the softmax within itself. So the trick is as follows: suppose you have any scalar $x$,

$$\alpha\exp(x) = \exp(\log(\alpha)).\exp(x) = \exp(\log(\alpha) + x)$$

So essentially, you need to add $\log(\alpha)$ to your logits $x$.

Hope this helps.

qyr0403 commented 10 months ago

thank you very much, i had not seen ''self.alpha = np.log(alpha + 1e-8)''