Doubts about the retraining accuracy

ShunLu91 commented 3 years ago

Thanks for your nice work and released code. We have tried the retraining part on ImageNet and here are some questions.

When retraining the searched model under your default training settings (8gpus in one machine), we get the accuracy below: DNA_a: 76.31000003662109 (epoch 496) DNA_b: 76.63600003417969 (epoch 483) DNA_c: 77.20800003662109 (epoch 474) DNA_d: 77.7220000366211 (epoch 433)
We read the issue in #10 , and change the training settings. Specifically, we use 32 gpus, the batchsize=128 and lr=0.256, and optimize the network at each step.
(--nproc_per_node=8 --nnodes=4) --model ${model_name} --epochs 500 --warmup-epochs 5 --batch-size 128 --lr 0.256 --opt rmsproptf --opt-eps 0.001 \ --sched step --decay-epochs 3 --decay-rate 0.963 --color-jitter 0.06 --drop 0.2 -j 8 --num-classes 1000 --model-ema \ We only retrained the DNA_a to check the accuracy. However, we only get a worse result: DNA_a best metric=76.08965138479905 (epoch 486).

Could you please help me find out why this difference? Great thanks and best wishes.

changlin31 commented 3 years ago

Hi,

We now have some better results with new settings: DNA_c: 78.1 DNA_d: 78.9

We uses drop path, rand aug, random erase, and increased the magnitude of color jittor. Detailed hyper-parameters are listed in the file: args.txt, please make sure your generated args.yaml when running the code is the same with this file.

Here is the training log of DNA-c, it reaches 78.13399992675781 at epoch 420: _dnac.txt

ShunLu91 commented 3 years ago

Thanks for your prompt reply and I will re-run the experiment with your provided hyper-parameters.

ShunLu91 commented 3 years ago

By following the newly released training settings above, I have achieved the same or higher results than the paper. DNA_a: 77.07799987792968 (epoch 448) DNA_c: 78.4480001586914 (epoch 437) Great thanks to the nice authors.

changlin31 commented 3 years ago

Hi @ShunLu91 ,

Could you share your training scripts for these results? I will add them to README as standard training scripts. Thanks in advance!

ShunLu91 commented 3 years ago

I used the same config as you mentioned above and adopted the newly released 'timm' code. I think differences may be from random seeds. Because in the 'timm' code, they never fix all the random seeds. Additionally, I find that when we adopt 8 GPU cards and keep the total batch size unchanged, the performance can be further improved.

changlin31 / DNA

Doubts about the retraining accuracy #26