Open utterances-bot opened 2 months ago
Thanks for the impressive result! I was interested, why did you decide to stick with SGD?
Glad you found the blog helpful.
I stuck with SGD + cosine decay because I unfortunately didn't have the time nor compute to try out other optimizers. It's very likely that, with careful hyperparameter tuning, a different optimizer would lead to a slightly better result in fewer epochs.
Vision Transformers are Overrated | Frank’s Ramblings
Attaining ViT/ConvNeXt performance with a couple of simple modifications to ResNet.
https://frankzliu.com/blog/vision-transformers-are-overrated