Thanks for your work,but the extra loss I do not know why you do this:
1 The extra loss is makes the l2 norm small,and you do this for hard sample mining?
2 In the code,the extra loss is used only when the loss is softmax or sphereface,why?The other loss can not use it?
Thanks for your work,but the extra loss I do not know why you do this: 1 The extra loss is makes the l2 norm small,and you do this for hard sample mining? 2 In the code,the extra loss is used only when the loss is softmax or sphereface,why?The other loss can not use it?