Open SparkJiao opened 5 years ago
Hi SparkJiao,
Could you also test with an SGD that its lr
equals to the final_lr
of AdaBound?
Current info is too less to make an educated guess. What I am most confident is AdaBound could outperform SGD (viz. faster, and not worse than SGD) with similar settings.
We did find that AdaBound is more robust on CV tasks than NLP. The reason might be that adaptive methods are more useful in unbalanced data (word embedding is a typical example of this). On this kind of task SGD might be worse than Adam, therefore transforming to SGD cannot help.
@SparkJiao besides, I am currently testing AdaBound on reading comprehension task as well. What dataset and model did you test? It seems that my preliminary result shows that AdaBound is still the best.
OK, if I have free gpu, I will test the performance with SGD.
By the way, I'm working on CoQA, Conversational Question Answering Challenge and the model being tested is modified from FlowQA, whose author has pushed his code to Github and the paper is also under open review for ICLR2019.
Thank you!
@SparkJiao thanks. I have tested CoQA but not with FlowQA. I will have a check of their paper.
Great thanks to your work!
The line with orange color is baseline using Adam as optimizer and the line with blue color is the baseline using AdaBound. I think the performance is much worse? Or I need to wait more patiently?
What's your opinion? Thank you very much!