Junya-Chen / FlatCLR

FlatNCE: A Novel Contrastive Representation Learning Objective
https://arxiv.org/pdf/2107.01152.pdf
84 stars 8 forks source link

can not understand the loss of flatclr.py #2

Open yikeqingli opened 3 years ago

yikeqingli commented 3 years ago

Hi,Below line 114 of flatclr.py, no matter what the value of v is, the value of loss_vec will be a vector of all ones. Therefore, in line 119, loss_vec.mean()-1 must be 0. So what is the significance of this item? At the same time, detach is added after the cross_entropy of the item after loss. According to my understanding, this means that the calculated value of this item will not return the gradient. So how does the loss value optimize the network?

Junya-Chen commented 3 years ago

loss_vec.mean()-1 is equal to zero, but the gradient is not equal to zero. Thus, by minimizing the loss function by gradient descent, we can get a contrastive representation learning. cross_entropy.detach() is just used to show the progress of learning, which is not included in the actual loss function.

ZifengLiu98 commented 1 year ago

Hi! Thanks for your great work and I have learned a lot from it. However I still have the confusion about the loss. I want to apply your flatclr in my supervised contrastive learning project. Since the value of loss is always 0, how can gradient descent to be completed? I added the entropy loss and the flatclr. It seems that the flatclr loss contribute nothing to the total loss. Could u please help me to understand your flatclr? In my project it always equal to 0

lxysl commented 1 year ago

I am impressed by this brilliant work. But I have the same confusion about the code implementation. It seems the code implementation not completely match with the paper. I wonder if I got something wrong or lost.

In the paper, the pseudo code of the FlatNCE is as follow:

image

In this repository, the implementation code of it is as follow:

_, features = self.model(images)
logits, labels = self.flat_loss(features)
v = torch.logsumexp(logits, dim=1, keepdim=True) #(512,1)
loss_vec = torch.exp(v-v.detach())

assert loss_vec.shape == (len(logits),1)
dummy_logits = torch.cat([torch.zeros(logits.size(0),1).to(self.args.device), logits],1)
loss = loss_vec.mean()-1 + self.criterion(dummy_logits, labels).detach() #+

What confused me a lot is, isn't the loss_vec here is just the $l_{FlatNCE}$ in the pseudo code? What are the next a few lines doing? Can you help me with some explanation please?

Besides, I am working on integrate your FlatNCE loss to the Supervised Contrastive Learning. It's kind of difficult since there is a mask of the positive samples in it, instead of only one positive sample which is the sample itself. And also, there isn't a InfoNCE function or so in it. If you could also give me some guidance on it, I will be so much grateful!

ZifengLiu98 commented 1 year ago

I am impressed by this brilliant work. But I have the same confusion about the code implementation. It seems the code implementation not completely match with the paper. I wonder if I got something wrong or lost.

In the paper, the pseudo code of the FlatNCE is as follow: image In this repository, the implementation code of it is as follow:

_, features = self.model(images)
logits, labels = self.flat_loss(features)
v = torch.logsumexp(logits, dim=1, keepdim=True) #(512,1)
loss_vec = torch.exp(v-v.detach())

assert loss_vec.shape == (len(logits),1)
dummy_logits = torch.cat([torch.zeros(logits.size(0),1).to(self.args.device), logits],1)
loss = loss_vec.mean()-1 + self.criterion(dummy_logits, labels).detach() #+

What confused me a lot is, isn't the loss_vec here is just the lFlatNCE in the pseudo code? What are the next a few lines doing? Can you help me with some explanation please?

Besides, I am working on integrate your FlatNCE loss to the Supervised Contrastive Learning. It's kind of difficult since there is a mask of the positive samples in it, instead of only one positive sample which is the sample itself. And also, there isn't a InfoNCE function or so in it. If you could also give me some guidance on it, I will be so much grateful!

我在表情识别的任务中使用了flatclr,但是它的表现不及交叉熵损失,比[Supervised Contrastive Learning]提供的对比损失会稍微好点。我不确定是不是我使用有误,更大的batchsize下flatclr也没有更优

LanXiaoPang613 commented 6 months ago

Hi! Thanks for your great work and I have learned a lot from it. However I still have the confusion about the loss. I want to apply your flatclr in my supervised contrastive learning project. Since the value of loss is always 0, how can gradient descent to be completed? I added the entropy loss and the flatclr. It seems that the flatclr loss contribute nothing to the total loss. Could u please help me to understand your flatclr? In my project it always equal to 0

In my undestanding, the first term in the flatNce maintains zero, but the second terms will gererate different gradient during each iteration. image