Threshold parameter ru+ in CIFP work

milliema commented 3 years ago

Thanks for the CIFP work on CVPR2021, it's very impressive! I'd like to check about the hyper-parameter setting. As mentioned in the paper, ru+ is set to be 1e-4 according to the experiment. However, in the code of cifp.py, I didn't find any definition of it. Could you please offer some instruction on this?

milliema commented 3 years ago

Besides, I found the method illustration in the paper and code implementation are inconsistent. The computation for threshold Tu, weighted FPR, and loss are not well corresponded in the code. For example, according to loss formula (10), the FPR punishment should be applied to non-target logit. However, the code changes target logit by "costheta.scatter(1, label.view(-1, 1).long(), target_cos_theta_m)". I am totally confused with the code. Could you please explain about the discrepancy? Appreciate more detailed notation of the codes if any.

xkx0430 commented 3 years ago

Q1: In the released code, we set "ru+" to be 1/(28000-1), which is very similar to the value 1e-4 in the paper. That is, the numbers of the false-positive cases mined by the two ways only differ by 1 for each instance. In this way, we avoid the choice of this value and achieve similar results.

Q2: Our loss function can be easily used in conjunction with the previous margin-based loss functions, such as ArcFace, CosFace, etc. We adopt the CosFace for a fair comparison in our paper. As described in Eq.8 and Eq.10, the margin for the positive logit is introduced by CosFace, while an extra false positive penalty term is introduced by our method for the negative logits. The code "costheta.scatter(1, label.view(-1, 1).long(), target_cos_theta_m)" is used to ensure that the target logit is added by the margin.

milliema commented 3 years ago

@xkx0430 Thanks for the reply.

I can understand the setting for ru+.
Yet for the FPR punishment, no modification is made onto the negative logits. If you campare the output with original cos_theta, the only difference is positive logit (caused by costheta.scatter(1, label.view(-1, 1).long(), target_cos_theta_m) as you explained). And the compuation of target_cos_theta_m includes many operations, not only cosine margin. Are you trying to say, modifying positive logit is identitcal to applying pentaly term to negative logit?
I'd also like to know the reason for using costheta, it's equal to cos_theta except for the gradient flow, and it leads to doubled computation in the FC layer. Looking forward to hear you soon.

xkx0430 commented 3 years ago

In Line 71, "(1 + target_costheta) * cos_theta_neg_topk" is the false positive penalty for each instance, and the calculation process is described from Line 60-70. To reduce the scatter operators, it is directly merged into the positive logit. "costheta" is used to calculate the penalty term without updating the classifier weights.

milliema commented 3 years ago

@xkx0430 感谢之前的回复，还有几处不懂的地方，为了避免重开issue将问题贴在下面：

1）代码58行计算出的cos_theta_neg_th 应该就是文中公式里的Tu吧。按我的理解，Tu是所有non-target logit中挑选第far_rank个最大的数值。但是为什么代码中是在小于target预测概率的剩余non-target logit中进行挑选？
2）代码70行应该就是文中公式（11）计算ri+/ru+的部分,因ru+设为1/(n-1),因此公式中分母可以消掉。但为什么代码中用了平均值而不是公式分子中的总和？
3）代码71行将FP penalty加到target logit上，按照文中公式（10）进行推导可以得到target logit的修改如下图，在原来的基础上减去alphari+/ru+（即cos_theta_neg_topk）。但代码中减去的部分为(1 + target_costheta) cos_theta_neg_topk，这里的(1 + target_costheta) 是alpha吗？
4）target logit加入penalty之后数值很小，几乎均为负数，这个正常吗？
5）关于梯度回传以及cos_theta_的使用仍然有些困惑。如果按照文中将penalty直接加到non-target logit上是不是可以避免这一操作？

zhangxiaopang88 commented 2 years ago

你好，你理解alpha是怎么设置的吗，我的理解是alpha设置为(1 + target_costheta)/ times,不知道这样理解对不对 @milliema @xkx0430 @hzlzh @HuangYG123

fuenwang commented 2 years ago

Hi @milliema

Do you figure out why the penalty is only applied to positive logit? I also have the same question for the implementation.

cos_theta.scatter_(1, label.view(-1, 1).long(), target_cos_theta_m)

milliema commented 2 years ago

Hi @milliema

Do you figure out why the penalty is only applied to positive logit? I also have the same question for the implementation.
cos_theta.scatter_(1, label.view(-1, 1).long(), target_cos_theta_m)

They are actually identical. Divide the term in both nominator and denominator, you'll get:

fuenwang commented 2 years ago

Hi @milliema, now I get it. Thank you so much!

Tencent / TFace

Threshold parameter ru+ in CIFP work #7