dasguptar / bcnn.pytorch

Bilinear CNNs in PyTorch
MIT License
20 stars 2 forks source link

Can't reproduce the result of 84% accuracy #3

Open hcygeorge opened 4 years ago

hcygeorge commented 4 years ago

I refered to your and Hao Mood's code and trained the BCNN model fine tuning all layers, and the best test accuracy I can reach was ~73%/~61% with and without pretrained VGG16. Is it easy to reach the accuracy of 84% you report?

I used almost the same hyperparameter setting, except the batch-size. Due to memory constraint, I can only set batch-size as 12. I doubt the small batch size would hurt the training but have no evidence. Since the VGG16 I used didn't include BN layers, and people just said small batch size can provide noise in training to prevent from poor generalization.

Because small batch size increase the variance of gradient, so I also tried to tune the lr rate in order to adjust that, but still can't improve the result.

Could you give me some advice on how to reach the 84% accuracy? or confirm that it is not possible to reach 84% accuracy when batch size is 12.

dasguptar commented 4 years ago

Hi @hcygeorge In my experience, BCNNs have been tricky to train with different hyperparameters, including batch size. Long ago I'd tried to replicate results using LuaTorch, but like you, had to reduce batch size. I could get close (~1-2% gap) to the official results by tweaking the learning rate and the momentum according to the changed batch size. My suggestion would be to keep trying to tweak the LR and momentum, or try a larger batch size.

hcygeorge commented 4 years ago

Thank you for your suggestion.

Last day I decided to downsize the image to 224x224 in order to increase batch size up to 64. And with pretrained VGG16 model, the test accuracy of BCNN reached 71%, while the train accuracy had reached ~100%. So I think that it is the best result we can get using BCNN on this down-sized dataset.

hcygeorge commented 4 years ago

Would you please tell me is it common to use cross validation to tune hyperparameter in fine grained classification?