Zood123 / COMET

2 stars 0 forks source link

Loss Function #2

Open mvandenhi opened 2 weeks ago

mvandenhi commented 2 weeks ago

Hi @Zood123, I'm trying to replicate the results from your work. I have noticed that (as I see it), the loss function implemented in your repository differs from the one stated in the paper. That is, in train.py, line 186 (which is the line that computes the loss of your method, right?), the second summand differs from the version in the paper.

  1. The loss is thresholded, which I don't see specified in the paper (e.g. see Algorithm 1).
  2. As "net.predictor.requiresgrad(False)" does not stop the backpropagation to go through this part of the model, see here, the loss with respect to the selector will be as follows Assuming the threshold is not reached: (training_opt["m_ploss"]+training_opt["m_closs"])loss1 - training_opt["m_closs"]loss2 + l1_loss. Thus, as I see it, this loss is different to the one specified in your publication, except if m_ploss = -4 & m_closs=5? I might be misunderstanding something, so would you mind clarifying the outlined discrepancies?

Additionally, would you agree that

loss = (loss1 - self.weight_a * loss2 + l1_loss) optimizers[0].zero_grad() optimizers[1].zero_grad() loss.backward() optimizers[0].step() optimizers[1].step() would be a valid loss computation?

mvandenhi commented 2 weeks ago

(also as a side-note, there are a lot of hardcoded paths, which make reproducibility hard)

Zood123 commented 2 weeks ago

Thank you for your interest in our work and for pointing out the clarification needed regarding the loss function.

Thank you for pointing out the loss function issue. I forgot to clarify this part. I simplified it in Algorithm 1, omitting the threshold to keep the method easy to understand. In practice, the threshold can help make the training process more stable. When the generated mask is near-ideal (loss2 is much bigger than loss1), setting the second loss term to zero can avoid artifacts.

For this loss: loss = (loss1 - self.weight_a * loss2 + l1_loss) optimizers[0].zero_grad() optimizers[1].zero_grad() loss.backward() optimizers[0].step() optimizers[1].step()

  1. Though I did not remove the threshold during the experiment, I believe the loss = (loss1 - self.weight_a * loss2 + l1_loss) will produce similar results.
  2. I usually compute gradients of the predictor first: optimizers[0].zero_grad() loss1.backward(retain_graph=True) And then, I calculate the gradients of the selector: net.predictor.requiresgrad(False) optimizers[1].zero_grad() loss.backward() net.predictor.requiresgrad(True) optimizers[0].step() optimizers[1].step()

Do you calculate gradients directly for both the predictor and the selector? (loss.backward()) I think I encountered some errors when I tried that during my experiment. Does that work out correctly for you?

For the ViT experiments: In the file: models/COMET_net.py You can find this comment: ''' self.predictor = timm.create_model('vit_small_patch16_224', pretrained=pretrained) self.predictor.head = torch.nn.Linear(self.predictor.head.in_features, num_classes)

self.completement_pred = timm.create_model('vit_small_patch16_224', pretrained=pretrained) self.completement_pred.head = torch.nn.Linear(self.completement_pred.head.in_features, num_classes) ''' Note: I used ViT only for the predictor and the feature detector, while I continued using LRASPP as the feature selector. I’m guessing you used ViT for mask generation, which might explain the pixelation in the mask.

Apologies for any confusion. Please feel free to email me if you have further questions.