I get the acc 56% by 5 epoch which make me saaaaaaaaaaad

ruiming46zrm commented 5 years ago

with epoch = 4 , batch_size = 256 , number workers = 3 , others parameter is the same as author. and with 4 GPUs training ms1m in my server . finally , get the accuracy rate = 56% , it's a very sad result. I don't know where I got wrong.
I don't believe the cause is the such short epoch ~~~

464312346@qq.com

TreB1eN commented 5 years ago

maybe It's the short epoch, why not train longer?

ruiming46zrm commented 5 years ago

maybe It's the short epoch, why not train longer? thx for re, however: Now its running to the 7th epoch ,the results is as following, acc declined . I think we cannot get 99% in 20 epoch like author did, might there be any other problems? thank you

  Recent results:
  model_2018-12-10-06-42_accuracy:0.5388571428571429_step:77333_None.pth
  model_2018-12-10-07-24_accuracy:0.5381428571428571_step:81882_None.pth
  model_2018-12-10-08-05_accuracy:0.557_step:86431_None.pth
  model_2018-12-10-08-46_accuracy:0.5299999999999999_step:90980_None.pth
  model_2018-12-10-09-27_accuracy:0.514_step:95529_None.pth
  model_2018-12-10-10-08_accuracy:0.5281428571428571_step:100078_None.pth
  model_2018-12-10-10-50_accuracy:0.5449999999999999_step:104627_None.pth
  model_2018-12-10-11-31_accuracy:0.5121428571428571_step:109176_None.pth
  model_2018-12-10-12-12_accuracy:0.6101428571428572_step:113725_None.pth
  model_2018-12-10-12-53_accuracy:0.5367142857142857_step:118274_None.pth
  model_2018-12-10-13-34_accuracy:0.5_step:122823_None.pth
  model_2018-12-10-14-14_accuracy:0.5_step:127372_None.pth
  model_2018-12-10-14-54_accuracy:0.5_step:131921_None.pth
  model_2018-12-10-15-34_accuracy:0.5_step:136470_None.pth

TreB1eN commented 5 years ago

What's your lr?

ruiming46zrm commented 5 years ago

What's your lr?

default : 1e-3 I didn't set a new one an interesting things : above results is in the server ; But in my local computer, using about 5% of ms1m data , batch_size:12 , single gpu , acc slowly increased to 66% when epoch is 10

TreB1eN commented 5 years ago

have you change any code ? because my code dosen't support multi card training

ruiming46zrm commented 5 years ago

yes , in 'Learner.py' :

 def train(self, conf, epochs):
    self.model.train()

    conf.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")   # add 
    self.model = nn.DataParallel(self.model,device_ids=[0,1,2,3])       # add 
    self.model.to(conf.device)         # add 

    running_loss = 0.
    for e in range(epochs):

TreB1eN commented 5 years ago

if you enlarge the batch_size to 4 times, normally you should also increase the lr linearly

TreB1eN / InsightFace_Pytorch

I get the acc 56% by 5 epoch which make me saaaaaaaaaaad #25