"CosineAnnealingLR" in color-mnist experiment

lileicv commented 3 years ago

Thanks the author release the code. I change the "StepLR" to "CosineAnnealingLR" in the color-mnist experiments (vanilla approach). The acc is much better than that reported in the paper. Can author explain how the hyperparameters is selected? (such the StepLR vs CosineAnnealingLR)

SanghyukChun commented 3 years ago

@lileicv Good to hear from you that our method still has room for improvements!

We couldn't have put much effort into the optimization hyperparameter search. For optimization parameters, we choose parameters that stably reduce the training loss values. There was no specific quantitative criterion, but we didn't fit the parameters on the test set (especially, we hide the unbiased test set during the hyperparameter search).

It would be better if we tested all possible combinations such as learning rate, weight decay, scheduler (e.g., warm-up, cosine, linear, exp decays, step, multi-step decays, ...), optimizers (now we strongly recommend our new optimizer AdamP, ICLR'21), but we chose very basic and popular parameters. Because too much fine optimization hyperparameter tuning often makes analysis difficult -- what if one chooses a very complex hyperparameter that only works for a specific method? Instead, we chose very basic optimization parameters in this paper.

We chose different hyperparameters for ImageNet and Kenetics experiments because they are large-scale experiments comparing to MNIST. We chose CosineAnnealingLR here because we empirically know that it is effective for ImageNet training (e.g., many state-of-the-art ImageNet classifiers uses cosine / exponential lr decay). We didn't have enough time to test the cosine learning rate decay to MNIST again when we wrote the paper, but I don't surprise that the ImageNet option performs better than the current MNIST option.

lileicv commented 3 years ago

@SanghyukChun Thanks for your reply and for sharing the code. It is a good work.

clovaai / rebias

"CosineAnnealingLR" in color-mnist experiment #5