Closed milliema closed 3 years ago
Besides, I found the method illustration in the paper and code implementation are inconsistent. The computation for threshold Tu, weighted FPR, and loss are not well corresponded in the code. For example, according to loss formula (10), the FPR punishment should be applied to non-target logit. However, the code changes target logit by "costheta.scatter(1, label.view(-1, 1).long(), target_cos_theta_m)". I am totally confused with the code. Could you please explain about the discrepancy? Appreciate more detailed notation of the codes if any.
Q1: In the released code, we set "ru+" to be 1/(28000-1), which is very similar to the value 1e-4 in the paper. That is, the numbers of the false-positive cases mined by the two ways only differ by 1 for each instance. In this way, we avoid the choice of this value and achieve similar results.
Q2: Our loss function can be easily used in conjunction with the previous margin-based loss functions, such as ArcFace, CosFace, etc. We adopt the CosFace for a fair comparison in our paper. As described in Eq.8 and Eq.10, the margin for the positive logit is introduced by CosFace, while an extra false positive penalty term is introduced by our method for the negative logits. The code "costheta.scatter(1, label.view(-1, 1).long(), target_cos_theta_m)" is used to ensure that the target logit is added by the margin.
@xkx0430 Thanks for the reply.
I can understand the setting for ru+.
Yet for the FPR punishment, no modification is made onto the negative logits. If you campare the output with original cos_theta, the only difference is positive logit (caused by costheta.scatter(1, label.view(-1, 1).long(), target_cos_theta_m) as you explained). And the compuation of target_cos_theta_m includes many operations, not only cosine margin. Are you trying to say, modifying positive logit is identitcal to applying pentaly term to negative logit?
I'd also like to know the reason for using costheta, it's equal to cos_theta except for the gradient flow, and it leads to doubled computation in the FC layer. Looking forward to hear you soon.
In Line 71, "(1 + target_costheta) * cos_theta_neg_topk" is the false positive penalty for each instance, and the calculation process is described from Line 60-70. To reduce the scatter operators, it is directly merged into the positive logit. "costheta" is used to calculate the penalty term without updating the classifier weights.
@xkx0430 感谢之前的回复,还有几处不懂的地方,为了避免重开issue将问题贴在下面:
你好,你理解alpha是怎么设置的吗,我的理解是alpha设置为(1 + target_costheta)/ times,不知道这样理解对不对 @milliema @xkx0430 @hzlzh @HuangYG123
Hi @milliema
Do you figure out why the penalty is only applied to positive logit? I also have the same question for the implementation.
cos_theta.scatter_(1, label.view(-1, 1).long(), target_cos_theta_m)
Hi @milliema
Do you figure out why the penalty is only applied to positive logit? I also have the same question for the implementation.
cos_theta.scatter_(1, label.view(-1, 1).long(), target_cos_theta_m)
They are actually identical. Divide the term in both nominator and denominator, you'll get:
Hi @milliema, now I get it. Thank you so much!
Thanks for the CIFP work on CVPR2021, it's very impressive! I'd like to check about the hyper-parameter setting. As mentioned in the paper, ru+ is set to be 1e-4 according to the experiment. However, in the code of cifp.py, I didn't find any definition of it. Could you please offer some instruction on this?