Closed NNNNAI closed 3 years ago
@NNNNAI We didn't try on cifar. But the frequency components setting should be the same as the ones on ImageNet. If we obtain results on cifar, we will post the results.
Thanks ~~~!!, please let me know if you got the results on cifar : )
BTW, what is the specific epoch of your FcaNet50 model that train on imagenet. And in which epoch does the Fcanet usually converge?
@NNNNAI Because we use cosine learning rate decay, the converge epoch is usually the last epoch. In our case, is the 100th epoch.
Can you provide the log files of fcanet50, that is, events.out.tfevents.* in the log folder. Of course, it would be best if you can provide the log files of all models : )
Thanks for your help. Sorry to disturb you again. May I ask what version of imagenet you are using. I downloaded imagenet2012 from the official website and trained with this data set. I found that the training log file is a bit different from the description in the log file you provided. , My total number of training steps is much more than that described in your log file. My training environment and parameter configuration are: a machine with with 4 Nvidia RTX 2080Ti GPUs, the batch_size is 128 as you mentioned in launch_training_classification.sh.
@NNNNAI It is because the log is trained with 8 GPUs. So the total batch size is 2 times larger than yours. But I don't think this will influence the final performance.
Thank you for responding so quickly. What version of imagenet are you using? Is it also imagenet 2012?
@NNNNAI I don't know the version, but there are 1000 classes in total, 1281167 images for training, and 50000 images for validation. FYI.
Thanks for your reply~! But I'm a bit confused. If you use 1281167 images for training, follow the demo training code in launch_training_classification.sh, fcanet50 is trained with a batch_size of 64. In your code, global_step = epoch int(math.ceil(train_loader._size / args.batch_size)). So with 8 gpu, after 100 epoch training is completed, shouldn't the corresponding step number be 1281167/8/ 64100 = 250227(250K)? But in the tensorboard logfile you provided before, the model is end at 120k steps instead of 250k steps. Did you use 16 gpus when you trained fca50?
And there is another kind request. Would you mind providing the training logfile of fcanet101 on the imagenet classification task? This will greatly help my study~~~~
Oh, sorry, I just remembered the existence of apex. The logfile you provided uses the apex strategy. So do all the downloadable models you provide use the apex strategy?
Thanks for your work!!
Have you tried using fcanet to train classification tasks on cifar10 or cifar100?. If you have tried, what is the frequency components setting?