Closed hongyuanyu closed 4 years ago
Hi,hongyuanyu. We implemented our training with 32x RTX2080ti, 64 batch size/gpu and optimizer_step every 2 iterations, so that we can guarntee a 4096 total batch size and initial lr 0.256 as efficientnets suggested. Small batch size and initial lr might reduce the final performance. You can try an optimizer_step every 4 iterations with 128 batch size/gpu and 0.256 lr to guarantee a big batch size.
Hi,
As for ImageNet retraining of the searched models, we used a similar protocol with EfficientNet [30], i.e., a batch size of 4,096, an RMSprop optimizer with momentum 0.9, and an initial learning rate of 0.256 which decays by 0.97 every 2.4 epochs.
Our training config is:
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 ./distributed_train.sh 8 ~/imagenet --model DNA_c \ --epochs 500 --warmup-epochs 5 --batch-size 64 --lr 0.256 --opt rmsproptf --opt-eps 0.001 --sched step --decay-epochs 3 --decay-rate 0.963 --color-jitter 0.06 --drop 0.2 -j 8 --num-classes 1000 --model-ema
with 4 nodes, i.e., 32 GPUs. And we step the optimizer every 2 training steps to simulate large training batch.
We achieve the highest top1 accuracy 77.77% at epoch 351.
The differences are the total batch size: 32x2X64=4096 vs. 8x128=1024. And we decrease the learning rate using the linear rule: lr = 0.256x1024/4096 = 0.064 in the suggested setting. This change in total batch size was intended for easier reproduce, but we can not guarantee the performance.
You can try enlarging your total batch size or step your optimizer less frequently as suggested by @jiefengpeng .
Thanks!
Hi,
Thanks for sharing the training code. I try to retrain DNA_c with this config:
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 ./distributed_train.sh 8 ~/imagenet --model DNA_c \ --epochs 500 --warmup-epochs 5 --batch-size 128 --lr 0.064 --opt rmsproptf --opt-eps 0.001 --sched step --decay-epochs 3 --decay-rate 0.963 --color-jitter 0.06 --drop 0.2 -j 8 --num-classes 1000 --model-ema
After 500 epochs training, the best top1 accuracy is 77.2%, which is 0.6% lower than paper.*** Best metric: 77.19799990478515 (epoch 458)