RElbers / info-nce-pytorch

PyTorch implementation of the InfoNCE loss for self-supervised learning.
MIT License
491 stars 40 forks source link

i have one question for the part of code #2

Closed mmmmmmrluo closed 3 years ago

mmmmmmrluo commented 3 years ago

if negative_keys is not None:

Explicit negative keys

    # Cosine between positive pairs
    positive_logit = torch.sum(query * positive_key, dim=1, keepdim=True)

    # Cosine between all query-negative combinations
    negative_logits = query @ transpose(negative_keys)

    # First index in last dimension are the positive samples
    logits = torch.cat([positive_logit, negative_logits], dim=1)
    labels = torch.zeros(len(logits), dtype=torch.long, device=query.device)

1)why the labels all are zero, Shouldn't there be a positive sample pairs labeled 1? 2)Is this cosine similarity? It should be just inner product?

RElbers commented 3 years ago

Hi.

1) The positive samples are in the 0th index of the logits. So labels is just a list of all 0s. 2) The vectors are first normalized and then the dot product is taken, which gives the cosine angle.

mmmmmmrluo commented 3 years ago

Hi.    Thanks for your answer.    I've learned. Now I wonder if it's feasible to replace cosine similarity with other distance or similarity measures.    Looking forward to your reply.

------------------ 原始邮件 ------------------ 发件人: "RElbers/info-nce-pytorch" @.>; 发送时间: 2021年8月19日(星期四) 凌晨3:49 @.>; @.**@.>; 主题: Re: [RElbers/info-nce-pytorch] i have one question for the part of code (#2)

Hi.

The positive samples are in the 0th index of the logits. So labels is just a list of all 0s.

The vectors are first normalized and then the dot product is taken, which gives the cosine angle.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

RElbers commented 3 years ago

In theory that should be possible. You just need a measure which gives low values for positive pairs and a high values for negative pairs.

mmmmmmrluo commented 3 years ago

Hi.     The Info_NCE formula is obtained by logSoftMax + nllLoss, i.e. nn.crossEntropyLoss(), and it is a positive value. In order to minimize the loss, shouldn't we maximize the softmax result? Shouldn't  make softMax's molecules,  namely positive sample pairs, be larger and negative sample pairs be smaller? Isn't that contrary to our intention? I don't understand this point, I hope you can give me some advice

------------------ 原始邮件 ------------------ 发件人: "RElbers/info-nce-pytorch" @.>; 发送时间: 2021年8月20日(星期五) 凌晨1:44 @.>; @.**@.>; 主题: Re: [RElbers/info-nce-pytorch] i have one question for the part of code (#2)

In theory that should be possible. You just need a measure which gives low values for positive pairs and a high values for negative pairs.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

mmmmmmrluo commented 3 years ago

Is it because we want to maximize the mutual information between the pairs of positive samples so you need to maximize the density ratio, and then the molecular dot product is proportional to the density ratio, so you need to maximize the molecular dot product?

------------------ 原始邮件 ------------------ 发件人: "RElbers/info-nce-pytorch" @.>; 发送时间: 2021年8月20日(星期五) 凌晨1:44 @.>; @.**@.>; 主题: Re: [RElbers/info-nce-pytorch] i have one question for the part of code (#2)

In theory that should be possible. You just need a measure which gives low values for positive pairs and a high values for negative pairs.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

RElbers commented 3 years ago

Sorry, what I said in my previous comment was wrong. We want high values (similarity) between positive pairs and low values for negative pairs. And to optimize this, we can simple use the categorical cross entropy.

mmmmmmrluo commented 3 years ago

thank you very much, i understand, in this case, the normalized inner product of the vector represents cosine similarity, and the larger the inner product, the higher the similarity,, which makes logical sense.

------------------ 原始邮件 ------------------ 发件人: "RElbers/info-nce-pytorch" @.>; 发送时间: 2021年8月20日(星期五) 下午4:36 @.>; @.**@.>; 主题: Re: [RElbers/info-nce-pytorch] i have one question for the part of code (#2)

Sorry, what I said in my previous comment was wrong. We want high values (similarity) between positive pairs and low values for negative pairs. And to optimize this, we can simple use the categorical cross entropy.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.