Closed ruiming46zrm closed 5 years ago
maybe It's the short epoch, why not train longer?
maybe It's the short epoch, why not train longer? thx for re, however: Now its running to the 7th epoch ,the results is as following, acc declined . I think we cannot get 99% in 20 epoch like author did, might there be any other problems? thank you
Recent results:
model_2018-12-10-06-42_accuracy:0.5388571428571429_step:77333_None.pth
model_2018-12-10-07-24_accuracy:0.5381428571428571_step:81882_None.pth
model_2018-12-10-08-05_accuracy:0.557_step:86431_None.pth
model_2018-12-10-08-46_accuracy:0.5299999999999999_step:90980_None.pth
model_2018-12-10-09-27_accuracy:0.514_step:95529_None.pth
model_2018-12-10-10-08_accuracy:0.5281428571428571_step:100078_None.pth
model_2018-12-10-10-50_accuracy:0.5449999999999999_step:104627_None.pth
model_2018-12-10-11-31_accuracy:0.5121428571428571_step:109176_None.pth
model_2018-12-10-12-12_accuracy:0.6101428571428572_step:113725_None.pth
model_2018-12-10-12-53_accuracy:0.5367142857142857_step:118274_None.pth
model_2018-12-10-13-34_accuracy:0.5_step:122823_None.pth
model_2018-12-10-14-14_accuracy:0.5_step:127372_None.pth
model_2018-12-10-14-54_accuracy:0.5_step:131921_None.pth
model_2018-12-10-15-34_accuracy:0.5_step:136470_None.pth
What's your lr?
What's your lr?
default : 1e-3 I didn't set a new one an interesting things : above results is in the server ; But in my local computer, using about 5% of ms1m data , batch_size:12 , single gpu , acc slowly increased to 66% when epoch is 10
have you change any code ? because my code dosen't support multi card training
yes , in 'Learner.py' :
def train(self, conf, epochs):
self.model.train()
conf.device = torch.device("cuda" if torch.cuda.is_available() else "cpu") # add
self.model = nn.DataParallel(self.model,device_ids=[0,1,2,3]) # add
self.model.to(conf.device) # add
running_loss = 0.
for e in range(epochs):
if you enlarge the batch_size to 4 times, normally you should also increase the lr linearly
with epoch = 4 , batch_size = 256 , number workers = 3 , others parameter is the same as author. and with 4 GPUs training ms1m in my server . finally , get the accuracy rate = 56% , it's a very sad result. I don't know where I got wrong.
I don't believe the cause is the such short epoch ~~~
464312346@qq.com