About the performance on cifar10 or cifar100.

cfzd / FcaNet

FcaNet: Frequency Channel Attention Networks

MIT License

510 stars 100 forks source link

About the performance on cifar10 or cifar100. #9

Closed NNNNAI closed 3 years ago

NNNNAI commented 3 years ago

Thanks for your work!!

Have you tried using fcanet to train classification tasks on cifar10 or cifar100?. If you have tried, what is the frequency components setting?

cfzd commented 3 years ago

@NNNNAI We didn't try on cifar. But the frequency components setting should be the same as the ones on ImageNet. If we obtain results on cifar, we will post the results.

NNNNAI commented 3 years ago

Thanks ~~~！！, please let me know if you got the results on cifar : )

NNNNAI commented 3 years ago

BTW, what is the specific epoch of your FcaNet50 model that train on imagenet. And in which epoch does the Fcanet usually converge?

cfzd commented 3 years ago

@NNNNAI Because we use cosine learning rate decay, the converge epoch is usually the last epoch. In our case, is the 100th epoch.

NNNNAI commented 3 years ago

Can you provide the log files of fcanet50, that is, events.out.tfevents.* in the log folder. Of course, it would be best if you can provide the log files of all models : )

cfzd commented 3 years ago

@NNNNAI Sorry for the late response. The log file can be found here.

NNNNAI commented 3 years ago

Thanks for your help. Sorry to disturb you again. May I ask what version of imagenet you are using. I downloaded imagenet2012 from the official website and trained with this data set. I found that the training log file is a bit different from the description in the log file you provided. , My total number of training steps is much more than that described in your log file. My training environment and parameter configuration are: a machine with with 4 Nvidia RTX 2080Ti GPUs, the batch_size is 128 as you mentioned in launch_training_classification.sh.

cfzd commented 3 years ago

@NNNNAI It is because the log is trained with 8 GPUs. So the total batch size is 2 times larger than yours. But I don't think this will influence the final performance.

NNNNAI commented 3 years ago

Thank you for responding so quickly. What version of imagenet are you using? Is it also imagenet 2012?

cfzd commented 3 years ago

@NNNNAI I don't know the version, but there are 1000 classes in total, 1281167 images for training, and 50000 images for validation. FYI.

NNNNAI commented 3 years ago

Thanks for your reply~! But I'm a bit confused. If you use 1281167 images for training, follow the demo training code in launch_training_classification.sh, fcanet50 is trained with a batch_size of 64. In your code, global_step = epoch int(math.ceil(train_loader._size / args.batch_size)). So with 8 gpu, after 100 epoch training is completed, shouldn't the corresponding step number be 1281167/8/ 64100 = 250227(250K)? But in the tensorboard logfile you provided before, the model is end at 120k steps instead of 250k steps. Did you use 16 gpus when you trained fca50?

NNNNAI commented 3 years ago

And there is another kind request. Would you mind providing the training logfile of fcanet101 on the imagenet classification task? This will greatly help my study~~~~

NNNNAI commented 3 years ago

Oh, sorry, I just remembered the existence of apex. The logfile you provided uses the apex strategy. So do all the downloadable models you provide use the apex strategy?

cfzd commented 3 years ago

@NNNNAI All models are trained with the apex mixed-precision tool, so the batch size of FcaNet50 is 128. In this way, 120k makes sense.

For models like FcaNet101 and FcaNet152, we use a batch size of 64 with apex.

Here is the log file of FcaNet101.