TreB1eN / InsightFace_Pytorch

Pytorch0.4.1 codes for InsightFace
MIT License
1.74k stars 423 forks source link

Maybe a bug #6

Closed JingyuanHu closed 5 years ago

JingyuanHu commented 6 years ago

Hi @TreB1eN , after reading the code, I found the follow codes may be a bug of your Arcloss. cond_mask = cond_v > 0 keep_val = (cos_theta - self.mm) # when theta not in [0,pi], use cosface instead cos_theta_m[cond_mask] = keep_val[cond_mask]
I think it should be : cond_mask = cond_v <= 0 . In fact, you implement a special Am-softmax with margin = mm, instead of ArcLoss.

TreB1eN commented 5 years ago

emm.... There have been sometime since I wrote the codes, after some check on the codes, here is the conclusion:

  1. I check on the original codes of arcface in mxnet here, line345-366: https://github.com/deepinsight/insightface/blob/master/src/train.py

if args.output_c2c==0: cos_m = math.cos(m) sin_m = math.sin(m) mm = math.sin(math.pi-m)*m

threshold = 0.0

  threshold = math.cos(math.pi-m)
  if args.easy_margin:
    cond = mx.symbol.Activation(data=cos_t, act_type='relu')
  else:
    cond_v = cos_t - threshold
    cond = mx.symbol.Activation(data=cond_v, act_type='relu')
  body = cos_t*cos_t
  body = 1.0-body
  sin_t = mx.sym.sqrt(body)
  new_zy = cos_t*cos_m
  b = sin_t*sin_m
  new_zy = new_zy - b
  new_zy = new_zy*s
  if args.easy_margin:
    zy_keep = zy
  else:
    zy_keep = zy - s*mm
  new_zy = mx.sym.where(cond, new_zy, zy_keep)

the cond is from a relu function here, if cond >0, use new_zy, if cond <=0, use zy_keep instead. So I think this should be a mistake, I already changed it in the codes, but I need sometime to test the result based on this update. Have you try fix it ? Actually you can try and print the cond's value during training Thanks again, mate

JingyuanHu commented 5 years ago

Here is my result by changed code: agebd_30 get 96.30% , cfp_fp get 94.63%, lfw get 99.55%. age cfp lfw

TreB1eN commented 5 years ago

which model ? IRSE-50 ?

JingyuanHu commented 5 years ago

Yes, IRSE-50 @TreB1eN . There is a mistake of my result on lfw (99.55% not 95.55%). I've update it.

TreB1eN commented 5 years ago

I also tested the mobilefacenet, it also worked. So I think we should make the change, btw, I have change the training method in the latest push, it should get better performance, from the learning curve you presented, there is no evidence of benefit from lr schedule.

JingyuanHu commented 5 years ago

Yes, I didn't use any learning decay strategy. And i notice that you've fixed another problem about idx_. Now I think everything looks great.

TreB1eN commented 5 years ago

plz update to the latest push due to I have made a mistake in fixing the idx_ problem :), now should be fine

JingyuanHu commented 5 years ago

No more questions. This issue can be closed.

yxchng commented 5 years ago

@JingyuanHu @TreB1eN Why don't you follow the insightface code and use mm = math.sin(math.pi-m)*m? Your code use self.mm = self.sin_m * m # issue 1?

pikerbright commented 5 years ago

@TreB1eN, can you please check @yxchng 's issue? I also doubt about this issue.

yxchng commented 5 years ago

@pikerbright They are actually the same! sin function is symmetric so math.sin(math.pi-m) = math.sin(m).

haolinlyou commented 4 years ago

@TreB1eN @JingyuanHu when i train ir_se50 with ms1m, the loss will stay nearly 20 ,not decay and batch_acc is 0, while the cond_mask all is True. cond_mask = cond_v <= 0 Please tell me what i can to.

JingyuanHu commented 4 years ago

I'm not very sure, maybe you need larger batchsize or just wait for some time. @lk123400

Androsimus commented 3 years ago

@lk123400 I chose very good time for suggestion, but I'll try though. I had almost the same issue with training on different dataset: loss stucked at values 15-20. The reason was in learning rate. In my case, lower learning rate fixed the problem.