Add Adam and AdaMax - Githubissues

pkofod commented 5 months ago

Fixes #1012

I don't use Adam and AdaMax myself, but I suppose the slow convergence of Adam from zeros(2) is sort of expected sometimes? Otherwise it may be good to compare against another implementation.

julia> rosenbrock(x) =  (1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2
rosenbrock (generic function with 1 method)

julia> result = optimize(rosenbrock, ones(2), AdaMax(), Optim.Options(iterations=5000))
 * Status: success

 * Candidate solution
    Final objective value:     5.647600e-17

 * Found with
    Algorithm:     AdaMax

 * Convergence measures
    |x - x'|               = 1.50e-08 ≰ 0.0e+00
    |x - x'|/|x'|          = 1.50e-08 ≰ 0.0e+00
    |f(x) - f(x')|         = NaN ≰ 0.0e+00
    |f(x) - f(x')|/|f(x')| = NaN ≰ 0.0e+00
    |g(x)|                 = 9.95e-09 ≤ 1.0e-08

 * Work counters
    Seconds run:   0  (vs limit Inf)
    Iterations:    4637
    f(x) calls:    4638
    ∇f(x) calls:   4638

julia> result = optimize(rosenbrock, ones(2), Adam(), Optim.Options(iterations=5000))
 * Status: success

 * Candidate solution
    Final objective value:     4.950178e-16

 * Found with
    Algorithm:     Adam

 * Convergence measures
    |x - x'|               = 4.45e-08 ≰ 0.0e+00
    |x - x'|/|x'|          = 4.45e-08 ≰ 0.0e+00
    |f(x) - f(x')|         = NaN ≰ 0.0e+00
    |f(x) - f(x')|/|f(x')| = NaN ≰ 0.0e+00
    |g(x)|                 = 9.94e-09 ≤ 1.0e-08

 * Work counters
    Seconds run:   0  (vs limit Inf)
    Iterations:    590
    f(x) calls:    591
    ∇f(x) calls:   591

julia> result = optimize(rosenbrock, zeros(2), Adam(), Optim.Options(iterations=5000))
 * Status: failure (reached maximum number of iterations)

 * Candidate solution
    Final objective value:     2.899319e-01

 * Found with
    Algorithm:     Adam

 * Convergence measures
    |x - x'|               = 4.62e-01 ≰ 0.0e+00
    |x - x'|/|x'|          = 1.00e+00 ≰ 0.0e+00
    |f(x) - f(x')|         = NaN ≰ 0.0e+00
    |f(x) - f(x')|/|f(x')| = NaN ≰ 0.0e+00
    |g(x)|                 = 1.08e+00 ≰ 1.0e-08

 * Work counters
    Seconds run:   0  (vs limit Inf)
    Iterations:    5000
    f(x) calls:    5001
    ∇f(x) calls:   5001

julia> result = optimize(rosenbrock, zeros(2), AdaMax(), Optim.Options(iterations=5000))
 * Status: success

 * Candidate solution
    Final objective value:     9.309545e-17

 * Found with
    Algorithm:     AdaMax

 * Convergence measures
    |x - x'|               = 1.00e+00 ≰ 0.0e+00
    |x - x'|/|x'|          = 1.00e+00 ≰ 0.0e+00
    |f(x) - f(x')|         = NaN ≰ 0.0e+00
    |f(x) - f(x')|/|f(x')| = NaN ≰ 0.0e+00
    |g(x)|                 = 9.40e-09 ≤ 1.0e-08

 * Work counters
    Seconds run:   0  (vs limit Inf)
    Iterations:    4580
    f(x) calls:    4581
    ∇f(x) calls:   4581

codecov[bot] commented 5 months ago

Codecov Report

Attention: 5 lines in your changes are missing coverage. Please review.

Comparison is base (1a649e8) 84.73% compared to head (ac1a8a2) 84.90%.

Files	Patch %	Lines
src/multivariate/solvers/first_order/adam.jl	91.89%	3 Missing :warning:
src/multivariate/solvers/first_order/adamax.jl	94.28%	2 Missing :warning:

Additional details and impacted files

```diff @@ Coverage Diff @@ ## master #1069 +/- ## ========================================== + Coverage 84.73% 84.90% +0.17% ========================================== Files 44 46 +2 Lines 3419 3491 +72 ========================================== + Hits 2897 2964 +67 - Misses 522 527 +5 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

pkofod commented 5 months ago

Of course, the results of Adam and AdaMax will also depend on learning parameters etc... so

julia> result = optimize(rosenbrock, zeros(2), Optim.Adam(alpha=0.1), Optim.Options(iterations=100000))
 * Status: success

 * Candidate solution
    Final objective value:     4.408001e-16

 * Found with
    Algorithm:     Adam

 * Convergence measures
    |x - x'|               = 1.00e+00 ≰ 0.0e+00
    |x - x'|/|x'|          = 1.00e+00 ≰ 0.0e+00
    |f(x) - f(x')|         = NaN ≰ 0.0e+00
    |f(x) - f(x')|/|f(x')| = NaN ≰ 0.0e+00
    |g(x)|                 = 9.91e-09 ≤ 1.0e-08

 * Work counters
    Seconds run:   0  (vs limit Inf)
    Iterations:    2818
    f(x) calls:    2819
    ∇f(x) calls:   2819

JuliaNLSolvers / Optim.jl

Add Adam and AdaMax #1069

Codecov Report