antingshen / resnet-protofiles

Caffe Protofiles for MSRA ResNet: train prototxt
215 stars 244 forks source link

Training from scratch #1

Open baiyancheng20 opened 8 years ago

baiyancheng20 commented 8 years ago

Have you ever trained a model using your code? I tried to train a new model, but did not achieve the accuracy.

antingshen commented 8 years ago

Yes, I've trained the ResNet-50 from scratch, and it does not achieve MSRA's accuracy due to a few differences. This was noted in the README. If you figure out changes to reproduce MSRA's accuracy, please let me know and we can fix it :)

baiyancheng20 commented 8 years ago

@antingshen Hi, I have used this code to train ResNet-18 from scratch, and it did not achieve a good result, too. I found the training accuracy is higher than the test after about 6-10 echoes, which is abnorm. I still trained one model without BN layers which can achieve 65% top1 accuracy, better than models using BN layers. I change your code to train models on cifar10. I achieve 88.76% top1 accuracy vs 90.0 in He's paper, which seems correct. So I am very confused.Could you tell me what your training process is and your accuracy?

antingshen commented 8 years ago

Here's my ResNet-50 top-1 validation error with respect to epochs: plot

As you can see, this includes BN and reaches ~68% accuracy.

I'm not quite sure what's wrong either at the moment, besides my version not having random reshape & crop. If you figure it out please let me know :)

baiyancheng20 commented 8 years ago

image This is He's training and testing curves. We can see that the training errors are higher than testing ones before the 60 epochs. As you say, there is no realtime data augmentation (random reshape, color jittering, etc) in Caffe. This should be one reason. I have implementated a few data augmentation methods. If you are interested, we can cooperate and try to train the resnet.

There may be also another reason. Kaiming He said,

In our BN layers, the provided mean and variance are strictly computed using average (not moving average) on a sufficiently large training batch after the training procedure. The numerical results are very stable (variation of val error < 0.1%). Using moving average might lead to different results.

Do you know how to compute the mean/val as He said?

antingshen commented 8 years ago

Maybe, I think we might need a bit more detail or experimentation to find out the exact BN implementation.

I'm happy to cooperate. Let me know if you have any ideas.

baiyancheng20 commented 8 years ago

Could you use a modified data_reader.cpp https://github.com/lim0606/caffe-googlenet-bn for shuffering data during training? I found this can improve the accuracy for googlenet. I wonder whether it could improve resnet. PS. Do you use any instant messaging software? I think you are a Chinese, do you use QQ?

antingshen commented 8 years ago

The link is broken, but I think we want shuffling + random resize + random crop, all on the fly during training. Or at least it seems like it from the MSRA paper. I'd say modifying data_reader.cpp is the right idea.

I have WeChat & Messenger.

leonid-pishchulin commented 7 years ago

could somebody share resnet-18 model pre-trained on image net?