facebookarchive / caffe2

Caffe2 is a lightweight, modular, and scalable deep learning framework.
https://caffe2.ai
Apache License 2.0
8.42k stars 1.95k forks source link

Resnet50 accuracy is too high #2215

Open renganxu opened 6 years ago

renganxu commented 6 years ago

I was training resnet50 from scratch with the example provided in Caffe2. The training is on one server with 4 V100 GPUs. The following is the command I used:

python -m caffe2.python.examples.resnet50_trainer \
--train_data /cm/shared/scratch/database/caffe/ilsvrc12_train_lmdb/ \
--test_data /cm/shared/scratch/database/caffe/ilsvrc12_val_lmdb/ \
--epoch_size   1280000 \
--gpus 0,1,2,3 \
--batch_size 256 \
--num_epochs 100 \
--base_learning_rate 0.1

I trained with 100 epochs and the acuracy is as high as 95%. But based on the accuracy operator definition on https://caffe2.ai/docs/operators-catalogue.html#accuracy, the default accuracy should be top-1 accuracy and it should be less than 80% for Imagenet dataset. I also didn't find any code that changed the default accuracy level. So is there something wrong with the Resnet50 implementation? Or I did anything wrong?

The following is the last few lines in the training log.

INFO:resnet50_trai
ner:Finished iteration 4997/5000 of epoch 99 (1055.92 images/sec)
INFO:resnet50_trainer:Training loss: 0.211377546191, accuracy: 0.921875
INFO:resnet50_trainer:Finished iteration 4998/5000 of epoch 99 (996.31 images/sec)
INFO:resnet50_trainer:Training loss: 0.179322883487, accuracy: **0.96875**
INFO:resnet50_trainer:Finished iteration 4999/5000 of epoch 99 (1036.78 images/sec)
INFO:resnet50_trainer:Training loss: 0.254537671804, accuracy: **0.953125**
INFO:resnet50_trainer:Finished iteration 5000/5000 of epoch 99 (1002.47 images/sec)
INFO:resnet50_trainer:Training loss: 0.167547017336, accuracy: **0.953125**
0wu commented 6 years ago

I think you are looking at "training accuracy" which can be as high as 100% if overfitting https://github.com/caffe2/caffe2/blob/master/caffe2/python/examples/resnet50_trainer.py#L201

while top-1 80% accuracy typically refers to the test_accuracy or validation accuracy https://github.com/caffe2/caffe2/blob/master/caffe2/python/examples/resnet50_trainer.py#L234

renganxu commented 6 years ago

@0wu Thanks for the clarification. But since my test data is not None, why the log does not output the test accuracy? How to make it also output the test accuracy? I haven't tried python gdb yet, I will try it to run this python program step by step.