Training epoch of ML-Images pretrain

huangzehao commented 6 years ago

Hi, would you mind sharing the training setting of ML-Images pretrain model? I check the setting in example/train.sh and found that the BATCHSIZE is 1. There maybe something wrong in the example/train.sh.

Thanks!

wubaoyuan commented 6 years ago

@huangzehao Thanks for your interest. train.sh is just a simple demo, of which the default parameters are not those used in our pre-training on ML-Images. We are preparing an Arxiv manuscript, and will report our hyper-paramters.

huangzehao commented 6 years ago

@wubaoyuan Thanks for your reply. Looking forward to your preprint. But would you mind sharing the training epoch to me? I just want to know how many epoch I should train since we are reproducing the ML Images pretrain model. Thanks!

wubaoyuan commented 6 years ago

@huangzehao See the following description. However, our training is conducted using multi-node multi-gpu. If using single-node multi-gpu, it will take very long time.

"We adopt the stochastic gradient descent (SGD) with momentum and back propagation to train the ResNet-101 model. The hyper-parameters used in our training are specified as follows. There are 17,609,752 training images.The batch-size is 4096, and one epoch includes 4300 steps.The learning rate is adjusted with a warm-up strategy [26].Specifically, in the first 8 epochs, the learning rate starts at 0.01 and is increased with the factor 1.297 after each epoch.The learning rate will be 0.08 at the 9th epoch. Then, the learning rate is decayed with the factor 0.1 in every 25 epochs. The momentum is 0.9. The maximal epoch is 60.When updating the parameters of BatchNorm, the decay factor of moving average is 0.9, and the constant is set to 0.001 to avoid the 0 value of the variance. The parameter decay is 0.0001."

huangzehao commented 6 years ago

@wubaoyuan I got it! Thank your very much!

Tencent / tencent-ml-images

Training epoch of ML-Images pretrain #27