TreB1eN / InsightFace_Pytorch

Pytorch0.4.1 codes for InsightFace
MIT License
1.73k stars 423 forks source link

training paras for mobilefacenet #12

Closed cvtower closed 5 years ago

cvtower commented 5 years ago

Hi @TreB1eN ,

I found this line in the config.py file: conf.milestones = [3,4,5] # mobildefacenet but the milestones is not used during training, i guess that means learning rate is not decay. Learner.py +line 225: seems like self.schedule_lr() should be called during training according to the paper for mobilefacenet.

BTW, would you please share the training paras for mobilefacenet to reproduce your acc? batch size,init learning rate.

Thanks for your help!

TreB1eN commented 5 years ago

on my 1080Ti, ir_se50 model uses batchsize 100, mobilefacenet uses 200, and you can first find the best batchsize, then use find_lr to locate best lr in your machine, the default lr I set is 1e-3

TreB1eN commented 5 years ago

conf.milestones was the first time approach I tried, in the final version I just go with the pytorch's default lr scheduler, you can find following line in learner.py self.scheduler = optim.lr_scheduler.ReduceLROnPlateau(self.optimizer, patience=40, verbose=True) this means when val dataset score have no more improve after 40 intervals, then decay the lr to 1/10,

actually, the model is really quite easy to train, if you give it a certain long training epoch, I think the performance should be good enough

cvtower commented 5 years ago

Hi @TreB1eN ,

Nice work!

I directly use the mobilefacenet as the baseline of my work(a novel efficient cnn arch), and got better result. I will try to reproduce the original mobilefacenet of this repo.

Thanks very much for your reply!

cvtower commented 5 years ago

Hi @TreB1eN,

Just a feedback here:

  1. using default training paras: I found that the accuracy did not converge yet after 8 epoches. Personally, I guess it is the small lr-1e-3(compare to 1e-1) here slow down the training process. With large learning rate and learning rate decay training from startup could speed up training.
  2. optim.lr_scheduler.ReduceLROnPlateau(self.optimizer, patience=40, verbose=True) Refer to pytorch official docs, patience =40 here stands for training epoches instead of iterations, so will this work for the the default 8-epoch training?
TreB1eN commented 5 years ago

I have changed the training method in the latest commit, maybe lr decay with some milestone point setup are still better(I gave up the optim.lr_scheduler.ReduceLROnPlateau), you can have a look, if you achieved better performance using this repo, plz share you training parameters here

cvtower commented 5 years ago

@TreB1eN ,

Got that. Thanks very much for your contribution!

cvtower commented 5 years ago

@TreB1eN ,

My result for mobilefacenet: agedb_30:95.67 cfp_fp:90.50 lfw:99.45 batch_size 256

Thanks very much for your help!

puppet101 commented 5 years ago

@cvtower , could you please share your training parameters?

cvtower commented 5 years ago

@cvtower , could you please share your training parameters?

@puppet101 ,

training set: faces_emore modifications: training epochs = 8 conf.batch_size = 256 conf.lr = 1e-1 conf.milestones = [4,6,7] That's the paras i used. According available repos and issues, i guess the performance might be slightly better if larger number of training epochs were applied on faces_emore.

puppet101 commented 5 years ago

@cvtower , Thanks a lot!

cvtower commented 5 years ago

UPDATE acc for mobilefacenet: epoch 16 milestones = [8,12,14]

agedb_30_accuracy:95.90 cfp_fp_accuracy:92.10 lfw_accuracy:99.43 batch_size:256

AnthonyF333 commented 5 years ago

@cvtower Do you train from scratch or use the pretrain model?

cvtower commented 5 years ago

@cvtower Do you train from scratch or use the pretrain model?

Hello, From scratch.

AnthonyF333 commented 5 years ago

@cvtower Do you train from scratch or use the pretrain model?

Hello, From scratch.

Thanks for your reply! I have trained the network using the parameters you noticed, but the accuracy of LFW always stays in 0.5. It's so strange.

cvtower commented 5 years ago

@cvtower Do you train from scratch or use the pretrain model?

Hello, From scratch.

Thanks for your reply! I have trained the network using the parameters you noticed, but the accuracy of LFW always stays in 0.5. It's so strange.

Hello,

Please try beyondcompare and check you code, and make sure you use tensorboard correctly. According to my local log, mobilefacenet will obtain 97.8%+ accuracy on LFW after about 20k steps.

AnthonyF333 commented 5 years ago

@cvtower Do you train from scratch or use the pretrain model?

Hello, From scratch.

Thanks for your reply! I have trained the network using the parameters you noticed, but the accuracy of LFW always stays in 0.5. It's so strange.

Hello,

Please try beyondcompare and check you code, and make sure you use tensorboard correctly. According to my local log, mobilefacenet will obtain 97.8%+ accuracy on LFW after about 20k steps.

Thanks! I have check my code and I have found the reason! Because I have to transform the model to ncnn and ncnn doesn't support the l2_norm, I seperate the l2_norm from the network. But I forget to add the l2_norm during inference, so the test result looks strange.

AnthonyF333 commented 5 years ago

@cvtower By the way, do you get a better result using other training parameters?

cvtower commented 5 years ago

@cvtower By the way, do you get a better result using other training parameters?

No, i just use this repo as a baseline version to verify my designed network on face recognition task, and i guess the paper will be published later.

AnthonyF333 commented 5 years ago

@cvtower By the way, do you get a better result using other training parameters?

No, i just use this repo as a baseline version to verify my designed network on face recognition task, and i guess the paper will be published later.

OK, thank you!

marigoold commented 3 years ago

@cvtower Hi cvtower, thanks for your shared params!

UPDATE acc for mobilefacenet: epoch 16 milestones = [8,12,14]

agedb_30_accuracy:95.90 cfp_fp_accuracy:92.10 lfw_accuracy:99.43 batch_size:256

I used the same paras as yours, but got 99.31@LFW and 90.5@CFP_FP, could you please share your trained model with me? Thanks!

youthM commented 3 years ago

@cvtower Hi cvtower, thanks for your shared params!

UPDATE acc for mobilefacenet: epoch 16 milestones = [8,12,14] agedb_30_accuracy:95.90 cfp_fp_accuracy:92.10 lfw_accuracy:99.43 batch_size:256

I used the same paras as yours, but got 99.31@LFW and 90.5@CFP_FP, could you please share your trained model with me? Thanks!

Hi, I also used the same paras, and I just got 92.xx% with two GPUs, how many GPU did you use? would you mind sharing your trained model?Thanks a lot!

cvtower commented 3 years ago

@cvtower Hi cvtower, thanks for your shared params!

UPDATE acc for mobilefacenet: epoch 16 milestones = [8,12,14] agedb_30_accuracy:95.90 cfp_fp_accuracy:92.10 lfw_accuracy:99.43 batch_size:256

I used the same paras as yours, but got 99.31@LFW and 90.5@CFP_FP, could you please share your trained model with me? Thanks!

https://github.com/cvtower/seesawfacenet_pytorch I had uploaded my mobilefacenet pretrained model here in this repo.

cvtower commented 3 years ago

@cvtower Hi cvtower, thanks for your shared params!

UPDATE acc for mobilefacenet: epoch 16 milestones = [8,12,14] agedb_30_accuracy:95.90 cfp_fp_accuracy:92.10 lfw_accuracy:99.43 batch_size:256

I used the same paras as yours, but got 99.31@LFW and 90.5@CFP_FP, could you please share your trained model with me? Thanks!

Hi, I also used the same paras, and I just got 92.xx% with two GPUs, how many GPU did you use? would you mind sharing your trained model?Thanks a lot!

https://github.com/cvtower/seesawfacenet_pytorch I had uploaded my mobilefacenet pretrained model here in this repo.

youthM commented 3 years ago

@cvtower Thanks a lot, but why the acc of model_2019-05-19-16-47_accuracy_0.9158571428571429_step_712992_final.pth is 0.5 when I ran the evaluation code?