changlin31 / DNA

(CVPR 2020) Block-wisely Supervised Neural Architecture Search with Knowledge Distillation
234 stars 35 forks source link

imagenet top1 acc 无法达到论文精度 #7

Closed betterhalfwzm closed 4 years ago

betterhalfwzm commented 4 years ago

训练方法:
--lr=0.05 \ --n_gpu=4 \ --batch_size=256 \ --n_worker=32 \ --lr_type=cos \ --n_epoch=150 \ --wd=4e-5 \ --seed=2018 \ --optim=SGD

    # preprocessing
    input_size = 224
    imagenet_tran_train = [
        transforms.RandomResizedCrop(input_size, scale=(0.2, 1.0)),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
        #transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]),
    ]
    imagenet_tran_test = [
        transforms.Resize(int(input_size / 0.875)),
        #transforms.Resize([256,256]),
        transforms.CenterCrop(input_size),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
        #transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]),         
    ]

结果: acc/train_top1:80 acc/test_top1:74 能看一下是什么原因吗?或者训练代码或者细节能够开源吗?谢谢

betterhalfwzm commented 4 years ago

@jiefengpeng image

wanggrun commented 4 years ago

训练方法: --lr=0.05 --n_gpu=4 --batch_size=256 --n_worker=32 --lr_type=cos --n_epoch=150 --wd=4e-5 --seed=2018 --optim=SGD

    # preprocessing
    input_size = 224
    imagenet_tran_train = [
        transforms.RandomResizedCrop(input_size, scale=(0.2, 1.0)),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
        #transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]),
    ]
    imagenet_tran_test = [
        transforms.Resize(int(input_size / 0.875)),
        #transforms.Resize([256,256]),
        transforms.CenterCrop(input_size),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
        #transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]),         
    ]

结果: acc/train_top1:80 acc/test_top1:74 能看一下是什么原因吗?或者训练代码或者细节能够开源吗?谢谢

Hi, thank you for the interest in our work!

As for ImageNet retraining of the searched models, we used a similar protocol with [30], i.e., a batch size of 4,096, an RMSprop optimizer with momentum 0.9, and an initial learning rate of 0.256 which decays by 0.97 every 2.4 epochs.

The Imagenet top-1 accuracy can be easily reproduced by using Ross Wightman's pytorch-image-models with the above hyperparameters.

All the codes, including the codes of searching/training/inference, will be available in the future.

betterhalfwzm commented 4 years ago

训练方法: --lr=0.05 --n_gpu=4 --batch_size=256 --n_worker=32 --lr_type=cos --n_epoch=150 --wd=4e-5 --seed=2018 --optim=SGD

    # preprocessing
    input_size = 224
    imagenet_tran_train = [
        transforms.RandomResizedCrop(input_size, scale=(0.2, 1.0)),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
        #transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]),
    ]
    imagenet_tran_test = [
        transforms.Resize(int(input_size / 0.875)),
        #transforms.Resize([256,256]),
        transforms.CenterCrop(input_size),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
        #transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]),         
    ]

结果: acc/train_top1:80 acc/test_top1:74 能看一下是什么原因吗?或者训练代码或者细节能够开源吗?谢谢

Hi, thank you for the interest in our work!

As for ImageNet retraining of the searched models, we used a similar protocol with [30], i.e., a batch size of 4,096, an RMSprop optimizer with momentum 0.9, and an initial learning rate of 0.256 which decays by 0.97 every 2.4 epochs.

The Imagenet top-1 accuracy can be easily reproduced by using Ross Wightman's pytorch-image-models with the above hyperparameters.

All the codes, including the codes of searching/training/inference, will be available in the future.

./distributed_train.sh 8 ../ImageNet/ --model DNA_a -b 256 --sched step --epochs 500 --decay-epochs 3 --decay-rate 0.97 --opt rmsproptf --opt-eps .001 -j 32 --warmup-epochs 5 --weight-decay 1e-5 --drop 0.2 --model-ema --lr .256 用这些参数可以复现结果吗?

wanggrun commented 4 years ago

训练方法: --lr=0.05 --n_gpu=4 --batch_size=256 --n_worker=32 --lr_type=cos --n_epoch=150 --wd=4e-5 --seed=2018 --optim=SGD

    # preprocessing
    input_size = 224
    imagenet_tran_train = [
        transforms.RandomResizedCrop(input_size, scale=(0.2, 1.0)),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
        #transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]),
    ]
    imagenet_tran_test = [
        transforms.Resize(int(input_size / 0.875)),
        #transforms.Resize([256,256]),
        transforms.CenterCrop(input_size),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
        #transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]),         
    ]

结果: acc/train_top1:80 acc/test_top1:74 能看一下是什么原因吗?或者训练代码或者细节能够开源吗?谢谢

Hi, thank you for the interest in our work! As for ImageNet retraining of the searched models, we used a similar protocol with [30], i.e., a batch size of 4,096, an RMSprop optimizer with momentum 0.9, and an initial learning rate of 0.256 which decays by 0.97 every 2.4 epochs. The Imagenet top-1 accuracy can be easily reproduced by using Ross Wightman's pytorch-image-models with the above hyperparameters. All the codes, including the codes of searching/training/inference, will be available in the future.

./distributed_train.sh 8 ../ImageNet/ --model DNA_a -b 256 --sched step --epochs 500 --decay-epochs 3 --decay-rate 0.97 --opt rmsproptf --opt-eps .001 -j 32 --warmup-epochs 5 --weight-decay 1e-5 --drop 0.2 --model-ema --lr .256 用这些参数可以复现结果吗?

As the batch size in your proposed protocol is different from that of ours (*2568 vs. 4,096**), we suggest you decrease the learning rate by using the linear rule.

betterhalfwzm commented 4 years ago

训练方法: --lr=0.05 --n_gpu=4 --batch_size=256 --n_worker=32 --lr_type=cos --n_epoch=150 --wd=4e-5 --seed=2018 --optim=SGD

    # preprocessing
    input_size = 224
    imagenet_tran_train = [
        transforms.RandomResizedCrop(input_size, scale=(0.2, 1.0)),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
        #transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]),
    ]
    imagenet_tran_test = [
        transforms.Resize(int(input_size / 0.875)),
        #transforms.Resize([256,256]),
        transforms.CenterCrop(input_size),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
        #transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]),         
    ]

结果: acc/train_top1:80 acc/test_top1:74 能看一下是什么原因吗?或者训练代码或者细节能够开源吗?谢谢

Hi, thank you for the interest in our work! As for ImageNet retraining of the searched models, we used a similar protocol with [30], i.e., a batch size of 4,096, an RMSprop optimizer with momentum 0.9, and an initial learning rate of 0.256 which decays by 0.97 every 2.4 epochs. The Imagenet top-1 accuracy can be easily reproduced by using Ross Wightman's pytorch-image-models with the above hyperparameters. All the codes, including the codes of searching/training/inference, will be available in the future.

./distributed_train.sh 8 ../ImageNet/ --model DNA_a -b 256 --sched step --epochs 500 --decay-epochs 3 --decay-rate 0.97 --opt rmsproptf --opt-eps .001 -j 32 --warmup-epochs 5 --weight-decay 1e-5 --drop 0.2 --model-ema --lr .256 用这些参数可以复现结果吗?

As the batch size in your proposed protocol is different from that of ours (*2568 vs. 4,096**), we suggest you decrease the learning rate by using the linear rule.

好的,谢谢,epoch需要500这么多吗?

wanggrun commented 4 years ago

训练方法: --lr=0.05 --n_gpu=4 --batch_size=256 --n_worker=32 --lr_type=cos --n_epoch=150 --wd=4e-5 --seed=2018 --optim=SGD

    # preprocessing
    input_size = 224
    imagenet_tran_train = [
        transforms.RandomResizedCrop(input_size, scale=(0.2, 1.0)),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
        #transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]),
    ]
    imagenet_tran_test = [
        transforms.Resize(int(input_size / 0.875)),
        #transforms.Resize([256,256]),
        transforms.CenterCrop(input_size),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
        #transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]),         
    ]

结果: acc/train_top1:80 acc/test_top1:74 能看一下是什么原因吗?或者训练代码或者细节能够开源吗?谢谢

Hi, thank you for the interest in our work! As for ImageNet retraining of the searched models, we used a similar protocol with [30], i.e., a batch size of 4,096, an RMSprop optimizer with momentum 0.9, and an initial learning rate of 0.256 which decays by 0.97 every 2.4 epochs. The Imagenet top-1 accuracy can be easily reproduced by using Ross Wightman's pytorch-image-models with the above hyperparameters. All the codes, including the codes of searching/training/inference, will be available in the future.

./distributed_train.sh 8 ../ImageNet/ --model DNA_a -b 256 --sched step --epochs 500 --decay-epochs 3 --decay-rate 0.97 --opt rmsproptf --opt-eps .001 -j 32 --warmup-epochs 5 --weight-decay 1e-5 --drop 0.2 --model-ema --lr .256 用这些参数可以复现结果吗?

As the batch size in your proposed protocol is different from that of ours (*2568 vs. 4,096**), we suggest you decrease the learning rate by using the linear rule.

好的,谢谢,epoch需要500这么多吗?

谢谢。不需要500 epoch这么多。不过这不是用的step的schedule吗?设多少个epoch不影响其过程中的表现。您可以观察一下训练过程中的的验证曲线。