Luolc / AdaBound

An optimizer that trains as fast as Adam and as good as SGD.
https://www.luolc.com/publications/adabound/
Apache License 2.0
2.91k stars 329 forks source link

The optimizer may have a bad performance on reading comprehension model? #1

Open SparkJiao opened 5 years ago

SparkJiao commented 5 years ago

Great thanks to your work!

image

The line with orange color is baseline using Adam as optimizer and the line with blue color is the baseline using AdaBound. I think the performance is much worse? Or I need to wait more patiently?

What's your opinion? Thank you very much!

Luolc commented 5 years ago

Hi SparkJiao,

Could you also test with an SGD that its lr equals to the final_lr of AdaBound?

Current info is too less to make an educated guess. What I am most confident is AdaBound could outperform SGD (viz. faster, and not worse than SGD) with similar settings.

We did find that AdaBound is more robust on CV tasks than NLP. The reason might be that adaptive methods are more useful in unbalanced data (word embedding is a typical example of this). On this kind of task SGD might be worse than Adam, therefore transforming to SGD cannot help.

Luolc commented 5 years ago

@SparkJiao besides, I am currently testing AdaBound on reading comprehension task as well. What dataset and model did you test? It seems that my preliminary result shows that AdaBound is still the best.

SparkJiao commented 5 years ago

OK, if I have free gpu, I will test the performance with SGD.

By the way, I'm working on CoQA, Conversational Question Answering Challenge and the model being tested is modified from FlowQA, whose author has pushed his code to Github and the paper is also under open review for ICLR2019.

Thank you!

Luolc commented 5 years ago

@SparkJiao thanks. I have tested CoQA but not with FlowQA. I will have a check of their paper.