About joint training of CED

YeziKung commented 1 year ago

Hello I am very interested in your debias research and thank you very much for your selfless open source code. I noticed that “In practice, we also implemented a joint training strategy which aims to optimize the objective of (4) and (5) jointly and we empirically found it can achieve a better performance”is mentioned in the paper. And I found the setting alternative=False in the corresponding code. Is this “joint training”you mentioned? Also, if alternative=False, loss_hsic_f += self.hsic_factor self.hsic_loss(feat_unbias, feat_bias1.detach(), unbiased=True) and loss_hsic_g += -self.hsic_factor self.hsic_loss(feat_unbias.detach(), feat_bias1, unbiased=True) "and their corresponding formulas (4) and (5) don't make sense. In addition, I would also like to ask whether it is necessary to use the simplified evidence_loss from DebiasHead on the closed set if we do not do the open set identification task. I would like to use the NLLLoss instead (you also mentioned that the two are similar in oral). (And I found that edl_loss doesn't work if alternative=False, haha) Of course, I have not done experiments to verify it. Looking forward to your reply！

Cogito2012 commented 1 year ago

@YeziKung Hi, thanks for your interest! Regarding to your questions, yes, the default setting alternative=False indicates a joint training method. With the joint training, it means we only need two classification losses and an independency regularizer which consists of two HSIC losses. Thus, the loss formulas (4)(5) are practically computed by the following two HSIC losses:

loss_hsic1 = -1.0 * self.hsic_loss(alpha_unbias, alpha_bias1) # at the line 200
loss_hsic2 = -1.0 * self.hsic_loss(alpha_unbias, alpha_bias2)  # at the line 222

For the choice of EDL loss, yes, you can definitely try the simplest form of evidence_loss, e.g., without using calibration terms. In this case, it intrinsically reduces to the vanilla NLLLoss. Alternatively, you may also check the EDL paper (NeurIPS'18) and see if the other two forms of EDL loss could work or not.

Hope this can help you. Thank you!

YeziKung commented 1 year ago

With the joint training, it means we only need two classification losses and an independency regularizer which consists of two HSIC losses.

@Cogito2012 Thank you for your timely reply, which seems to prove my understanding of this part of the paper and code is correct. 1.Through your explanation and open supplemental, I think CED_Loss=loss_hsic1+loss_hsic2. However, if the EUC module is not considered, the loss function of the overall model is =vanilla AR_Loss+Loss_factor*CED_Loss（Loss_factor is 0.1 in DEAR）.Is that correct? 2.As for the "two classification losses" mentioned in your reply, through the code of debias_head.py, I think loss_cls1, loss_cls2 and loss_cls3 are not needed in joint learning. All we need is alpha_bias = self.exp_evidence(x) + 1 and alpha_unbias = self.exp_evidence(x) + 1. So are the “two classification losses”you mentioned used elsewhere? Or do“ two classification losses”just refer to alpha_xx. 3.Since you have done a lot of experiments to verify the feasibility of the model and module, the code is deeply nested. It is a pity that I have not found the overall Loss, that is, where formula (8) is calculated. Could you please help me to point it out if it is convenient?

谢谢你，包师兄！

Cogito2012 commented 1 year ago

@YeziKung For your questions,

Well, it could be correct if vanilla AR_Loss refers to the sum of the three vanilla EDL losses (on one debiased and two biased branches).
Why are the two cls losses of the biased branches (loss_cls1, loss_cls2) still needed in joint training? Here, my intuition is that, without these two cls losses as strong supervisions, the predicted alpha_bias1 and alpha_bias2 would mean nothing so that during training/optimization, the two branches could easily generate non-meaningfulalpha_bias{1,2} to be independent of alpha_unbias. In this case, we cannot say alpha_unbias is unbiased:) Note that, if we say some features are biased/unbiased, the premise is that they are capable of doing well on recognition, but grounded on spurious/intrinsic visual cues.
I totally understand that the codebase is rather complicated 🤣. In the spirit of the mmaction2 codebase, the overall loss will be automatically computed here: mmaction/models/losses/base.py#L41. You may also double-check this by step-by-step debugging~

Cogito2012 / DEAR

About joint training of CED #12