Tencent / TFace

A trusty face analysis research platform developed by Tencent Youtu Lab
Apache License 2.0
1.26k stars 225 forks source link

Threshold parameter ru+ in CIFP work #7

Closed milliema closed 3 years ago

milliema commented 3 years ago

Thanks for the CIFP work on CVPR2021, it's very impressive! I'd like to check about the hyper-parameter setting. As mentioned in the paper, ru+ is set to be 1e-4 according to the experiment. However, in the code of cifp.py, I didn't find any definition of it. Could you please offer some instruction on this?

milliema commented 3 years ago

Besides, I found the method illustration in the paper and code implementation are inconsistent. The computation for threshold Tu, weighted FPR, and loss are not well corresponded in the code. For example, according to loss formula (10), the FPR punishment should be applied to non-target logit. However, the code changes target logit by "costheta.scatter(1, label.view(-1, 1).long(), target_cos_theta_m)". I am totally confused with the code. Could you please explain about the discrepancy? Appreciate more detailed notation of the codes if any.

xkx0430 commented 3 years ago

Q1: In the released code, we set "ru+" to be 1/(28000-1), which is very similar to the value 1e-4 in the paper. That is, the numbers of the false-positive cases mined by the two ways only differ by 1 for each instance. In this way, we avoid the choice of this value and achieve similar results.

Q2: Our loss function can be easily used in conjunction with the previous margin-based loss functions, such as ArcFace, CosFace, etc. We adopt the CosFace for a fair comparison in our paper. As described in Eq.8 and Eq.10, the margin for the positive logit is introduced by CosFace, while an extra false positive penalty term is introduced by our method for the negative logits. The code "costheta.scatter(1, label.view(-1, 1).long(), target_cos_theta_m)" is used to ensure that the target logit is added by the margin.

milliema commented 3 years ago

@xkx0430 Thanks for the reply.

xkx0430 commented 3 years ago

In Line 71, "(1 + target_costheta) * cos_theta_neg_topk" is the false positive penalty for each instance, and the calculation process is described from Line 60-70. To reduce the scatter operators, it is directly merged into the positive logit. "costheta" is used to calculate the penalty term without updating the classifier weights.

milliema commented 3 years ago

@xkx0430 感谢之前的回复,还有几处不懂的地方,为了避免重开issue将问题贴在下面:

zhangxiaopang88 commented 2 years ago

你好,你理解alpha是怎么设置的吗,我的理解是alpha设置为(1 + target_costheta)/ times,不知道这样理解对不对 @milliema @xkx0430 @hzlzh @HuangYG123

fuenwang commented 2 years ago

Hi @milliema

Do you figure out why the penalty is only applied to positive logit? I also have the same question for the implementation.

cos_theta.scatter_(1, label.view(-1, 1).long(), target_cos_theta_m)
milliema commented 2 years ago

Hi @milliema

Do you figure out why the penalty is only applied to positive logit? I also have the same question for the implementation.

cos_theta.scatter_(1, label.view(-1, 1).long(), target_cos_theta_m)

They are actually identical. Divide the term in both nominator and denominator, you'll get: image

fuenwang commented 2 years ago

Hi @milliema, now I get it. Thank you so much!