AkihikoWatanabe commented 2 weeks ago

URL

https://arxiv.org/abs/2411.02853
Authors
- Shohei Taniguchi
- Keno Harada
- Gouki Minegishi
- Yuta Oshima
- Seong Cheol Jeong
- Go Nagahara
- Tomoshi Iiyama
- Masahiro Suzuki
- Yusuke Iwasawa
- Yutaka Matsuo
  Abstract
- Adam is one of the most popular optimization algorithms in deep learning. However, it is known that Adam does not converge in theory unless choosing a hyperparameter, i.e., $\beta_2$, in a problem-dependent manner. There have been many attempts to fix the non-convergence (e.g., AMSGrad), but they require an impractical assumption that the gradient noise is uniformly bounded. In this paper, we propose a new adaptive gradient method named ADOPT, which achieves the optimal convergence rate of $\mathcal{O} ( 1 / \sqrt{T} )$ with any choice of $\beta_2$ without depending on the bounded noise assumption. ADOPT addresses the non-convergence issue of Adam by removing the current gradient from the second moment estimate and changing the order of the momentum update and the normalization by the second moment estimate. We also conduct intensive numerical experiments, and verify that our ADOPT achieves superior results compared to Adam and its variants across a wide range of tasks, including image classification, generative modeling, natural language processing, and deep reinforcement learning. The implementation is available at https://github.com/iShohei220/adopt.
  Translation (by gpt-4o-mini)
Adamは深層学習において最も人気のある最適化アルゴリズムの一つです。しかし、理論的には、問題依存の方法でハイパーパラメータ（$\beta_2$）を選択しない限り、Adamは収束しないことが知られています。非収束を解決するための多くの試み（例えば、AMSGrad）がありましたが、これらは勾配ノイズが一様に制約されているという非現実的な仮定を必要とします。本論文では、ADOPTと呼ばれる新しい適応勾配法を提案します。ADOPTは、制約されたノイズの仮定に依存せず、任意の$\beta_2$の選択で最適な収束率$\mathcal{O} ( 1 / \sqrt{T} )$を達成します。ADOPTは、現在の勾配を二次モーメント推定から除去し、モーメント更新と二次モーメント推定による正規化の順序を変更することで、Adamの非収束問題に対処します。また、我々は集中的な数値実験を行い、ADOPTが画像分類、生成モデル、自然言語処理、深層強化学習などの幅広いタスクにおいて、Adamおよびその変種と比較して優れた結果を達成することを確認しました。実装は、https://github.com/iShohei220/adopt で入手可能です。
Summary (by gpt-4o-mini)
ADOPTという新しい適応勾配法を提案し、任意のハイパーパラメータ$\beta_2$で最適な収束率を達成。勾配の二次モーメント推定からの除去と更新順序の変更により、Adamの非収束問題を解決。広範なタスクで優れた結果を示し、実装はGitHubで公開。

AkihikoWatanabe commented 2 weeks ago

画像は元ツイートからの引用: ライブラリがあるようで、1行変えるだけですぐ使えるとのこと。

元ツイート:https://x.com/ishohei220/status/1854051859385978979?s=46&t=Y6UuIHB0Lv0IpmFAjlc2-Q

AkihikoWatanabe commented 2 weeks ago

Adamでは収束しなかった場合でも収束するようになっている模様

AkihikoWatanabe / paper_notes

ADOPT: Modified Adam Can Converge with Any $β_2$ with the Optimal Rate, Shohei Taniguchi+, NeurIPS'24 #1482

URL

Authors

Abstract

Translation (by gpt-4o-mini)

Summary (by gpt-4o-mini)