YimianDai / open-aff

code and trained models for "Attentional Feature Fusion"
729 stars 95 forks source link

Training from scratch - log report #11

Closed moabarar closed 3 years ago

moabarar commented 3 years ago

Hi,

First, I think your paper is very interesting, excellent work!

I was wondering if you have training from scratch report avialable ? All the aviable reports are based on pretrained models with already high-accuracy (specifically, I am referring to CIFAR100 expierement).

YimianDai commented 3 years ago

Hi, I have uploaded the training log from scratch with the prefix "From_Scratch". You can find them in the corresponding folders now.

The ResNet result is almost the same, but the ResNeXt result is around 0.6%~0.8% lower than training on pretrained models. I guess the reason is that training ResNeXt is more difficult than ResNet. Therefore, a pretrained model for ResNeXt helps a lot for a higher accuracy.

By the way, the accuracy on the paper is the accuracy training from scratch. I update the accuracy on paperswithcode website with the highest accuracy.

moabarar commented 3 years ago

Thanks a lot for the quick reply! Again - well done!

sbl1996 commented 3 years ago

Can you provide the training script used for training from scratch ? It seems that the current provided training log from scratch does not use 'train_cifar.py' in this repo.

YimianDai commented 3 years ago

Sorry. Can you explain a bit more? I do not understand what "the current provided training log from scratch does not use 'train_cifar.py' in this repo." means?

What I have in mind is that I just renamed train_cifar_mixup.py to train_cifar.py when I create this repo. Everything else is the same. It is because my private repo is very redundant and has many codes unrelated with this paper. Therefore, I created this new public repo when I release the code.

sbl1996 commented 3 years ago

As shown in open-aff/params/cifar100/AFF-ResNet-32/From_Scratch_Log_train_cifar100_cifar100-ASKCFuse-resnet-32-c-4-s-1.log, the log line is "INFO:root:[Epoch 0] train=0.079624 val=0.107400 loss=3.099419 time: 20.976282". However, in train_cifar.py, the log format is [Epoch %d] train=%f val=%f loss=%f lr: %f time: %f. They are different.

YimianDai commented 3 years ago

In train_cifar.py, you can choose cosine or step, not cosine only. It depends whether you add --cosine in the training script.

I started with step at first, the training from scratch log version. When I tried to use the pretrained model to have a higher accuracy, I added the cosine code because people say that cosine lr can improve the performance so I had a try. I guess it is the reason why you think my log format mismatch, because I change it when I add the cosine code. The current log format is the version after I add the cosine code.

Unfortunately, it seems that cosine didn't perform better than step on my cifar experiment. However, as long as the network is not changed, the network can use the stored params produced by both step and cosine.

sbl1996 commented 3 years ago

Thank you, I have noticed --cosine option. Except for printing lr, are there any other differences between the training script you used for training from scratch and train_cifar.py?

YimianDai commented 3 years ago

I think I only have made two changes on the code train_cifar.py when I started to run the very first experiment. The first is adding the label smoothing choice, the second is the adding cosine choice.

What I have in mind is that the training log for AFF-ResNeXt-38-32x4d was not trained with label smoothing because it is trained before I added the label smoothing choice. But the rest models are all with label smoothing.

Sorry for the inconvenience about the updates in my code. The ideal case is that I trained all these models after everything is fixed with the final version of code. But I have very limited access to GPUs, so I basically presented the training logs and params on the fly.

sbl1996 commented 3 years ago

Thank you very much for your detailed response.