RElbers / info-nce-pytorch

PyTorch implementation of the InfoNCE loss for self-supervised learning.
MIT License
445 stars 37 forks source link

I wonder if the way of InfoNCE I used was wrong( ´•̥̥̥ω•̥̥̥` ) #12

Closed evelynlee999 closed 1 year ago

evelynlee999 commented 1 year ago

Hi,

I try to optimize the code by using InfoNCE Loss and AAM loss, and code about InfoNCE as followed which is based your code:

def contrastive(self, embeddings_z: t.Tensor, embeddings: t.Tensor, logits: t.Tensor):
        logits1 = logits
        high = embeddings.shape[0]
        idx = random.randint(0,int(high)-1)

        query = embeddings[idx:idx+1]
        positive_key = embeddings_z[idx:idx+1]
        # negative_keys = embeddings_z
        negative_keys = t.cat((embeddings_z[:idx],embeddings_z[idx+1:]))
        query = F.normalize(query, dim=-1)
        positive_key = F.normalize(positive_key, dim=-1)
        negative_keys = F.normalize(negative_keys, dim=-1)

        # Cosine between positive pairs
        positive_logit = t.sum(query * positive_key, dim=1, keepdim=True)

        negative_logits = query @ self.transpose(negative_keys)
        logits = t.cat([positive_logit, negative_logits], dim=1)
        labels = t.zeros(len(logits), dtype=t.long, device=query.device)

        loss = F.cross_entropy(logits / self.temperature, labels, reduction=self.reduction)

        with t.no_grad():
            # put predictions into [0, 1] range for later calculation of accuracy
            prediction = F.softmax(logits1, dim=1).detach()

        return loss,prediction

Joint AAM and InfoNce as followed:

self.c_contrastive = nn.Parameter(torch.rand(1))
loss = self.c_aam * aam_loss + self.c_contrastive * contrastive_loss

The smaller loss result, the better performance. But when I ran the code, the c_contrastive always became negative, which was mean the bigger loss result the better performance. so I wonder if the code of InfoNCE I used was wrong.

I was trapped in this for a long time. Soooo looking forward to your reply: )

RElbers commented 1 year ago

Are you sure that you want self.c_contrastive to be a Parameter? It will also get the gradient of the loss, unless you set requires_grad=False in the constructor. Minimizing the loss will minimize self.c_contrastive, causing it to become negative.

evelynlee999 commented 1 year ago

Yeah, I want it be a weight of InfoNce and can participate in training to get the best result. I'll give it try. Thanks a lot.

Yuntian9708 commented 1 year ago

For joint loss, I set weights of infonce loss or other loss I used as hyperparameter, instead of a parameter during optimizing. In my opinion, setting hyperparameter is a good way.

evelynlee999 commented 1 year ago

Yeah, I’ve found some code use it as hyper parameter too. Thank u sooo much🥰发自我的 iPhone在 2023年6月14日,21:04,Yuntian Wang @.***> 写道: For joint loss, I set weights of infonce loss or other loss I used as hyperparameter, instead of a parameter during optimizing. In my opinion, setting hyperparameter is a good way.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you are subscribed to this thread.Message ID: @.***>

evelynlee999 commented 1 year ago

大佬看你名字好像中国人,呜呜呜,所以那块儿设置成超参,你说我后面那个用的infonce loss哦,能把这个当正则项吗,然后那个不当权重,当正则项系数,我这几天看代码啥的,好像正则项系数也是超参发自我的 iPhone在 2023年6月14日,21:04,Yuntian Wang @.***> 写道: For joint loss, I set weights of infonce loss or other loss I used as hyperparameter, instead of a parameter during optimizing. In my opinion, setting hyperparameter is a good way.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you are subscribed to this thread.Message ID: @.***>

Yuntian9708 commented 1 year ago

这个想法听着可行,您可以试试看看效果。我感觉损失的设计还是要紧密结合任务,看infonce loss在您的任务中具体起什么作用。比如我是少样本的分类问题,希望通过对比学习更丰富的Latent representation,我的损失就是两个infonce和一个分类的交叉熵: total loss = ce_loss+αinfonce1+βinfonce2, α和β都是超参,调就可以了。我感觉多个损失联合优化时权重超参的设置对模型表现影响挺大的。

evelynlee999 commented 1 year ago

好的好的,谢谢!发自我的 iPhone在 2023年6月14日,21:58,Yuntian Wang @.***> 写道: 这个想法听着可行,您可以试试看看效果。我感觉损失的设计还是要紧密结合任务,看infonce loss在您的任务中具体起什么作用。比如我是少样本的分类问题,希望通过对比学习更丰富的Latent representation,我的损失就是两个infonce和一个分类的交叉熵: total loss = ce_loss+αinfonce1+βinfonce2, α和β都是超参,调就可以了。我感觉多个损失联合优化时权重超参的设置对模型表现影响挺大的。

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you are subscribed to this thread.Message ID: @.***>