Closed betterhalfwzm closed 4 years ago
@jiefengpeng
训练方法: --lr=0.05 --n_gpu=4 --batch_size=256 --n_worker=32 --lr_type=cos --n_epoch=150 --wd=4e-5 --seed=2018 --optim=SGD
# preprocessing input_size = 224 imagenet_tran_train = [ transforms.RandomResizedCrop(input_size, scale=(0.2, 1.0)), transforms.RandomHorizontalFlip(), transforms.ToTensor(), transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), #transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]), ] imagenet_tran_test = [ transforms.Resize(int(input_size / 0.875)), #transforms.Resize([256,256]), transforms.CenterCrop(input_size), transforms.ToTensor(), transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), #transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]), ]
结果: acc/train_top1:80 acc/test_top1:74 能看一下是什么原因吗?或者训练代码或者细节能够开源吗?谢谢
Hi, thank you for the interest in our work!
As for ImageNet retraining of the searched models, we used a similar protocol with [30], i.e., a batch size of 4,096, an RMSprop optimizer with momentum 0.9, and an initial learning rate of 0.256 which decays by 0.97 every 2.4 epochs.
The Imagenet top-1 accuracy can be easily reproduced by using Ross Wightman's pytorch-image-models with the above hyperparameters.
All the codes, including the codes of searching/training/inference, will be available in the future.
训练方法: --lr=0.05 --n_gpu=4 --batch_size=256 --n_worker=32 --lr_type=cos --n_epoch=150 --wd=4e-5 --seed=2018 --optim=SGD
# preprocessing input_size = 224 imagenet_tran_train = [ transforms.RandomResizedCrop(input_size, scale=(0.2, 1.0)), transforms.RandomHorizontalFlip(), transforms.ToTensor(), transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), #transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]), ] imagenet_tran_test = [ transforms.Resize(int(input_size / 0.875)), #transforms.Resize([256,256]), transforms.CenterCrop(input_size), transforms.ToTensor(), transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), #transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]), ]
结果: acc/train_top1:80 acc/test_top1:74 能看一下是什么原因吗?或者训练代码或者细节能够开源吗?谢谢
Hi, thank you for the interest in our work!
As for ImageNet retraining of the searched models, we used a similar protocol with [30], i.e., a batch size of 4,096, an RMSprop optimizer with momentum 0.9, and an initial learning rate of 0.256 which decays by 0.97 every 2.4 epochs.
The Imagenet top-1 accuracy can be easily reproduced by using Ross Wightman's pytorch-image-models with the above hyperparameters.
All the codes, including the codes of searching/training/inference, will be available in the future.
./distributed_train.sh 8 ../ImageNet/ --model DNA_a -b 256 --sched step --epochs 500 --decay-epochs 3 --decay-rate 0.97 --opt rmsproptf --opt-eps .001 -j 32 --warmup-epochs 5 --weight-decay 1e-5 --drop 0.2 --model-ema --lr .256 用这些参数可以复现结果吗?
训练方法: --lr=0.05 --n_gpu=4 --batch_size=256 --n_worker=32 --lr_type=cos --n_epoch=150 --wd=4e-5 --seed=2018 --optim=SGD
# preprocessing input_size = 224 imagenet_tran_train = [ transforms.RandomResizedCrop(input_size, scale=(0.2, 1.0)), transforms.RandomHorizontalFlip(), transforms.ToTensor(), transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), #transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]), ] imagenet_tran_test = [ transforms.Resize(int(input_size / 0.875)), #transforms.Resize([256,256]), transforms.CenterCrop(input_size), transforms.ToTensor(), transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), #transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]), ]
结果: acc/train_top1:80 acc/test_top1:74 能看一下是什么原因吗?或者训练代码或者细节能够开源吗?谢谢
Hi, thank you for the interest in our work! As for ImageNet retraining of the searched models, we used a similar protocol with [30], i.e., a batch size of 4,096, an RMSprop optimizer with momentum 0.9, and an initial learning rate of 0.256 which decays by 0.97 every 2.4 epochs. The Imagenet top-1 accuracy can be easily reproduced by using Ross Wightman's pytorch-image-models with the above hyperparameters. All the codes, including the codes of searching/training/inference, will be available in the future.
./distributed_train.sh 8 ../ImageNet/ --model DNA_a -b 256 --sched step --epochs 500 --decay-epochs 3 --decay-rate 0.97 --opt rmsproptf --opt-eps .001 -j 32 --warmup-epochs 5 --weight-decay 1e-5 --drop 0.2 --model-ema --lr .256 用这些参数可以复现结果吗?
As the batch size in your proposed protocol is different from that of ours (*2568 vs. 4,096**), we suggest you decrease the learning rate by using the linear rule.
训练方法: --lr=0.05 --n_gpu=4 --batch_size=256 --n_worker=32 --lr_type=cos --n_epoch=150 --wd=4e-5 --seed=2018 --optim=SGD
# preprocessing input_size = 224 imagenet_tran_train = [ transforms.RandomResizedCrop(input_size, scale=(0.2, 1.0)), transforms.RandomHorizontalFlip(), transforms.ToTensor(), transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), #transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]), ] imagenet_tran_test = [ transforms.Resize(int(input_size / 0.875)), #transforms.Resize([256,256]), transforms.CenterCrop(input_size), transforms.ToTensor(), transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), #transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]), ]
结果: acc/train_top1:80 acc/test_top1:74 能看一下是什么原因吗?或者训练代码或者细节能够开源吗?谢谢
Hi, thank you for the interest in our work! As for ImageNet retraining of the searched models, we used a similar protocol with [30], i.e., a batch size of 4,096, an RMSprop optimizer with momentum 0.9, and an initial learning rate of 0.256 which decays by 0.97 every 2.4 epochs. The Imagenet top-1 accuracy can be easily reproduced by using Ross Wightman's pytorch-image-models with the above hyperparameters. All the codes, including the codes of searching/training/inference, will be available in the future.
./distributed_train.sh 8 ../ImageNet/ --model DNA_a -b 256 --sched step --epochs 500 --decay-epochs 3 --decay-rate 0.97 --opt rmsproptf --opt-eps .001 -j 32 --warmup-epochs 5 --weight-decay 1e-5 --drop 0.2 --model-ema --lr .256 用这些参数可以复现结果吗?
As the batch size in your proposed protocol is different from that of ours (*2568 vs. 4,096**), we suggest you decrease the learning rate by using the linear rule.
好的,谢谢,epoch需要500这么多吗?
训练方法: --lr=0.05 --n_gpu=4 --batch_size=256 --n_worker=32 --lr_type=cos --n_epoch=150 --wd=4e-5 --seed=2018 --optim=SGD
# preprocessing input_size = 224 imagenet_tran_train = [ transforms.RandomResizedCrop(input_size, scale=(0.2, 1.0)), transforms.RandomHorizontalFlip(), transforms.ToTensor(), transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), #transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]), ] imagenet_tran_test = [ transforms.Resize(int(input_size / 0.875)), #transforms.Resize([256,256]), transforms.CenterCrop(input_size), transforms.ToTensor(), transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), #transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]), ]
结果: acc/train_top1:80 acc/test_top1:74 能看一下是什么原因吗?或者训练代码或者细节能够开源吗?谢谢
Hi, thank you for the interest in our work! As for ImageNet retraining of the searched models, we used a similar protocol with [30], i.e., a batch size of 4,096, an RMSprop optimizer with momentum 0.9, and an initial learning rate of 0.256 which decays by 0.97 every 2.4 epochs. The Imagenet top-1 accuracy can be easily reproduced by using Ross Wightman's pytorch-image-models with the above hyperparameters. All the codes, including the codes of searching/training/inference, will be available in the future.
./distributed_train.sh 8 ../ImageNet/ --model DNA_a -b 256 --sched step --epochs 500 --decay-epochs 3 --decay-rate 0.97 --opt rmsproptf --opt-eps .001 -j 32 --warmup-epochs 5 --weight-decay 1e-5 --drop 0.2 --model-ema --lr .256 用这些参数可以复现结果吗?
As the batch size in your proposed protocol is different from that of ours (*2568 vs. 4,096**), we suggest you decrease the learning rate by using the linear rule.
好的,谢谢,epoch需要500这么多吗?
谢谢。不需要500 epoch这么多。不过这不是用的step的schedule吗?设多少个epoch不影响其过程中的表现。您可以观察一下训练过程中的的验证曲线。
训练方法:
--lr=0.05 \ --n_gpu=4 \ --batch_size=256 \ --n_worker=32 \ --lr_type=cos \ --n_epoch=150 \ --wd=4e-5 \ --seed=2018 \ --optim=SGD
结果: acc/train_top1:80 acc/test_top1:74 能看一下是什么原因吗?或者训练代码或者细节能够开源吗?谢谢