Explanation for achieving better performance than original paper

balancap / tf-imagenet

TensorFlow ImageNet - Training and SOTA checkpoints

Apache License 2.0

51 stars 16 forks source link

Explanation for achieving better performance than original paper #2

Open AIROBOTAI opened 6 years ago

AIROBOTAI commented 6 years ago

Thanks for sharing your great work!

The MobileNet-v1 you trained achieved 72.9 top-1 acc. which surpasses the reported number (70.6) in original paper by a large margin. Could you please explain the reasons? Thanks!

balancap commented 6 years ago

Thanks!

That's a good question, and honestly I am not sure I completely know why! From the original MobileNets paper, it seems they use hyperparameters from the Inception papers, whereas I tried with the recent Nasnet paper hyperparameter (much large learning rate of ~0.2), which seem to give much better accuracy. I got the same good results with the MobileNets v2 models (and here, the reported numbers in the papers are pretty close as well).

AIROBOTAI commented 6 years ago

Wow, that's a big discovery! This is a strong evidence that how important hyperparameters are in DL :-D Thanks for your explanation!

AIROBOTAI commented 6 years ago

Hi @balancap, I'd like to run your souce codes for training MobileNet-v1/v2. I guess the training command should be python tf_cnn_benchmarks.py followed by hyperparameter settings. Could you please show me the list of hyperparameters you use? Or do you just follow Nasnet? Thanks!

AIROBOTAI commented 6 years ago

Hi @balancap, could you please share more details of hyperparameters? Thanks a lot!

haoxi911 commented 6 years ago

@balancap Do you mean that you were training MobileNet v1 using learning rate ~0.2 which achieved a better accuracy than original paper?

My learning rate was set to ~0.05, I tried 6000 to 10000 steps and only got 68% top-1 accuracy. Is it a problem of the small learning rate?