Closed d-monnet closed 7 months ago
Attention: Patch coverage is 87.50000%
with 19 lines
in your changes are missing coverage. Please review.
Project coverage is 88.77%. Comparing base (
5c08163
) to head (d5f409d
). Report is 8 commits behind head on main.
Files | Patch % | Lines |
---|---|---|
src/fomo.jl | 87.50% | 19 Missing :warning: |
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
The code looks nice and clean, thank you. My main question is: if R2 is a special case of Fomo, I think we should
1) remove the current R2 2) add a new R2 “solver” that simply calls Fomo under the hood to help with the transition.
Also, please check the CI failures. In particular, allocation tests fail with Julia 1 and nightly on Linux and macOS: https://github.com/JuliaSmoothOptimizers/JSOSolvers.jl/actions/runs/7745561973/job/21121848905?pr=250#step:6:307
The code looks nice and clean, thank you. My main question is: if R2 is a special case of Fomo, I think we should
1. remove the current R2 2. add a new R2 “solver” that simply calls Fomo under the hood to help with the transition.
My only concern is that fomo allocates two more vector than R2, so it's not quite efficient in that respect. But yes, doing so would avoid having an extra implementation.
The code looks nice and clean, thank you. My main question is: if R2 is a special case of Fomo, I think we should
1. remove the current R2 2. add a new R2 “solver” that simply calls Fomo under the hood to help with the transition.
This is done with the last few commits. I made sure to keep the exact same R2
and R2Solver
interfaces. However solve!()
is not quite pretty now that it has to handle R2 specific case.
@dpo I'll let you review the last modifications and share your thoughts. In particular, I avoid m
and d
memory allocations in FomoSolver
when using R2 and I'd like to be sure it is implemented properly.
Also, please check the CI failures. In particular, allocation tests fail with Julia 1 and nightly on Linux and macOS: https://github.com/JuliaSmoothOptimizers/JSOSolvers.jl/actions/runs/7745561973/job/21121848905?pr=250#step:6:307
Looks like allocation test passes with the new R2
interface to FOMO backend (see last commits). The allocation problem can therefore be narrowed down to here
https://github.com/JuliaSmoothOptimizers/JSOSolvers.jl/blob/85022e16a22b83c98abead21b23cc99b2e7fb959/src/fomo.jl#L313
and there
https://github.com/JuliaSmoothOptimizers/JSOSolvers.jl/blob/85022e16a22b83c98abead21b23cc99b2e7fb959/src/fomo.jl#L327-L329
, but I don't see why allocation would occur here. I don't have a unix system, so it's not convenient for me to debug that.
The code looks nice and clean, thank you. My main question is: if R2 is a special case of Fomo, I think we should
1. remove the current R2 2. add a new R2 “solver” that simply calls Fomo under the hood to help with the transition.
My only concern is that fomo allocates two more vector than R2, so it's not quite efficient in that respect. But yes, doing so would avoid having an extra implementation.
You could have a constructor for an R2Solver
that constructs a FOMOSolver
but allocates those two vectors with length zero.
header
is causing allocations and break related tests, I'll fix that rapidly.
Also there are allocations (not due to header
) occurring with Fomo
, can't find where it comes from...
header
is causing allocations and break related tests, I'll fix that rapidly. Also there are allocations (not due toheader
) occurring withFomo
, can't find where it comes from...
Have you tried https://docs.julialang.org/en/v1/manual/profile/#Line-by-Line-Allocation-Tracking ?
Thank you, @d-monnet !!!
Add the new algorithm
fomo
to JSOSolvers. First-Order Momentum model-based (FOMO) algorithm can be viewed as the Heavy-Ball algorithm embedded in a R2/first-order Trust-Region framework to ensure the convergence to a first-order critical point in the non-convex case.fomo
computes steps in a gradient-related direction $d$ which includes the momentum contribution. The step is computed either in a R2 fashion: $s = -\alpha d$, or in a TR fashion $s = -\alpha d /||d||$, by choosing the associated backend. $\alpha$ is the inverse of the regularization parameter (usually denoted $\sigma$) in R2 context and the trust region radius in the TR context.Tests and doc have been updated.