Closed JingyuanHu closed 5 years ago
emm.... There have been sometime since I wrote the codes, after some check on the codes, here is the conclusion:
if args.output_c2c==0: cos_m = math.cos(m) sin_m = math.sin(m) mm = math.sin(math.pi-m)*m
threshold = math.cos(math.pi-m)
if args.easy_margin:
cond = mx.symbol.Activation(data=cos_t, act_type='relu')
else:
cond_v = cos_t - threshold
cond = mx.symbol.Activation(data=cond_v, act_type='relu')
body = cos_t*cos_t
body = 1.0-body
sin_t = mx.sym.sqrt(body)
new_zy = cos_t*cos_m
b = sin_t*sin_m
new_zy = new_zy - b
new_zy = new_zy*s
if args.easy_margin:
zy_keep = zy
else:
zy_keep = zy - s*mm
new_zy = mx.sym.where(cond, new_zy, zy_keep)
the cond is from a relu function here, if cond >0, use new_zy, if cond <=0, use zy_keep instead. So I think this should be a mistake, I already changed it in the codes, but I need sometime to test the result based on this update. Have you try fix it ? Actually you can try and print the cond's value during training Thanks again, mate
Here is my result by changed code: agebd_30 get 96.30% , cfp_fp get 94.63%, lfw get 99.55%.
which model ? IRSE-50 ?
Yes, IRSE-50 @TreB1eN . There is a mistake of my result on lfw (99.55% not 95.55%). I've update it.
I also tested the mobilefacenet, it also worked. So I think we should make the change, btw, I have change the training method in the latest push, it should get better performance, from the learning curve you presented, there is no evidence of benefit from lr schedule.
Yes, I didn't use any learning decay strategy. And i notice that you've fixed another problem about idx_. Now I think everything looks great.
plz update to the latest push due to I have made a mistake in fixing the idx_ problem :), now should be fine
No more questions. This issue can be closed.
@JingyuanHu @TreB1eN Why don't you follow the insightface code and use mm = math.sin(math.pi-m)*m
? Your code use self.mm = self.sin_m * m # issue 1
?
@TreB1eN, can you please check @yxchng 's issue? I also doubt about this issue.
@pikerbright They are actually the same! sin function is symmetric so math.sin(math.pi-m) = math.sin(m).
@TreB1eN @JingyuanHu when i train ir_se50 with ms1m, the loss will stay nearly 20 ,not decay and batch_acc is 0, while the cond_mask all is True.
cond_mask = cond_v <= 0
Please tell me what i can to.
I'm not very sure, maybe you need larger batchsize or just wait for some time. @lk123400
@lk123400 I chose very good time for suggestion, but I'll try though. I had almost the same issue with training on different dataset: loss stucked at values 15-20. The reason was in learning rate. In my case, lower learning rate fixed the problem.
Hi @TreB1eN , after reading the code, I found the follow codes may be a bug of your Arcloss. cond_mask = cond_v > 0 keep_val = (cos_theta - self.mm) # when theta not in [0,pi], use cosface instead cos_theta_m[cond_mask] = keep_val[cond_mask]
I think it should be : cond_mask = cond_v <= 0 . In fact, you implement a special Am-softmax with margin = mm, instead of ArcLoss.