Closed frankschae closed 9 months ago
Merging #206 (093490a) into master (c6c875f) will decrease coverage by
0.36%
. The diff coverage is50.00%
.
@@ Coverage Diff @@
## master #206 +/- ##
==========================================
- Coverage 48.45% 48.10% -0.36%
==========================================
Files 19 19
Lines 1814 1846 +32
==========================================
+ Hits 879 888 +9
- Misses 935 958 +23
Files | Coverage Δ | |
---|---|---|
src/raphson.jl | 57.89% <50.00%> (-15.13%) |
:arrow_down: |
:mega: Codecov offers a browser extension for seamless coverage viewing on GitHub. Try it in Chrome or Firefox today!
I had a discussion with @FHoltorf regarding an effective heuristic for updating the Jacobian:
That is precisely the thinking of Hairer II's approach. However, with a lot of benchmarking we have found that the optimal HP1 tends to be ... zero a lot of the time. So basically, just take new Jacobians when you start diverging instead of converging. That is a pretty hard heuristic to beat. In theory you could take it earlier if convergence slows, but we haven't found great schema for that. So for the first version I'd keep it simple just based off of a convergence heuristic set to zero.
You may want to base it off of the reuse heuristics of OrdinaryDiffEq.jl. Take a look at its nonlinear solvers which documents this.
That is precisely the thinking of Hairer II's approach. However, with a lot of benchmarking we have found that the optimal HP1 tends to be ... zero a lot of the time.
Yeah seems to be the case for the 23 test problems as well (red line below = speed without reuse). I am wondering though why they look so similar.
using NonlinearSolve, LinearAlgebra, LinearSolve, NonlinearProblemLibrary, Test, BenchmarkTools, Plots
problems = NonlinearProblemLibrary.problems
dicts = NonlinearProblemLibrary.dicts
function test_on_library(problems, dicts, ϵ=1e-5)
samples = 100
secs = 5
HP = [1e-14, 1e-13, 1e-12, 1e-11, 1e-10, 1e-9, 1e-8, 1e-7, 1e-6, 1e-5]
# store a list of all Plots
plts = []
alg = NewtonRaphson(; reuse=false)
broken_tests = (1,6) # test problems where method does not converge to small residual
for (idx, (problem, dict)) in enumerate(zip(problems, dicts))
title = dict["title"]
@testset "$title" begin
# Newton without reuse
x = dict["start"]
res = similar(x)
nlprob = NonlinearProblem(problem, x)
sol = solve(nlprob, alg, abstol = 1e-18, reltol = 1e-18)
problem(res, sol.u, nothing)
broken = idx in broken_tests ? true : false
@test norm(res)≤ϵ broken=broken
balg = @benchmarkable solve(nlprob, $alg, abstol=1e-18, reltol=1e-18)
talg = run(balg, samples=samples, seconds=secs)
# Newton with reuse
ts = []
for reusetol in HP
@show reusetol
sol = solve(nlprob, NewtonRaphson(; reuse=true, reusetol=reusetol), abstol=1e-18, reltol=1e-18)
problem(res, sol.u, nothing)
broken = idx in broken_tests ? true : false
@test norm(res) ≤ ϵ broken = broken
balg2 = @benchmarkable solve(nlprob, NewtonRaphson(; reuse=true, reusetol=$reusetol), abstol=1e-18, reltol=1e-18)
talg2 = run(balg2, samples=samples, seconds=secs)
push!(ts, mean(talg2.times) / mean(talg.times))
end
pl = scatter(HP, ts, xaxis=:log, xticks=HP, label=false, xlabel="HP", ylabel="mean(time w/ reuse)/mean(time wo/ reuse)", title=title)
hline!([1.0], color = "red", label = false)
display(pl)
savefig(pl, "Reuse-Newton-" * title * ".png")
push!(plts, pl)
end
end
return plts
end
# NewtonRaphson
plts = test_on_library(problems, dicts)
# Merge the plots into a single plot
merged_plot = plot(plts..., layout=(5, 5), size=(2500, 2500), margin=10*Plots.mm)
savefig(merged_plot, "Reuse-Newton.png")
So basically, just take new Jacobians when you start diverging instead of converging. That is a pretty hard heuristic to beat.
So only recomputing when the residual increases?
You may want to base it off of the reuse heuristics of OrdinaryDiffEq.jl. Take a look at its nonlinear solvers which documents this.
I only scanned quickly through https://github.com/SciML/OrdinaryDiffEq.jl/blob/master/src/nlsolve/newton.jl but didn't immediately see where the reuse happens. Is it in there?
So only recomputing when the residual increases?
Yup.
but didn't immediately see where the reuse happens. Is it in there?
https://github.com/SciML/OrdinaryDiffEq.jl/blob/master/src/nlsolve/nlsolve.jl#L63-L96
One heuristic (or part of one) that might be interesting to also try and incorporate is to recompute Jacobians when the Newton direction associated with the old Jacobian stops being a descent direction for the residual norm. If for nothing else but the fact that it would leverage the AD system nicely.
One would need to compute the gradient of the residual norm with reverse mode AD but, even though more expensive then just checking the residual, it would make things play nicely with linesearch. (If you don't make any of such checks, I am guessing you could run into trouble if you want to combine Jacobian reuse with linesearch?)
Ahh yes that would be an interesting approach to try.
For the gradient of the residual norm, I think we would need to change the default norm.
https://github.com/SciML/DiffEqBase.jl/blob/master/src/common_defaults.jl#L26C1-L33C4 drops the gradient
function res_norm1(u)
b = f(u, p)
sqrt(sum(abs2, b) / length(u))
end
function res_norm2(u)
b = f(u, p)
DiffEqBase.ODE_DEFAULT_NORM(b, nothing)
end
ForwardDiff.gradient(res_norm2, u0) # => zero(u0)
(Why is it actually sqrt(sum(abs2, b) / length(u))
and not sqrt(sum(abs2, b)) / length(u)
?
(Why is it actually sqrt(sum(abs2, b) / length(u)) and not sqrt(sum(abs2, b)) / length(u)?
That's just matching Hairer, I don't think it really matters which one it is.
https://github.com/SciML/DiffEqBase.jl/blob/master/src/common_defaults.jl#L26C1-L33C4 drops the gradient
We just shouldn't be differentiating the solvers here at all
how did the AD thing come up?
One heuristic (or part of one) that might be interesting to also try and incorporate is to recompute Jacobians when the Newton direction associated with the old Jacobian stops being a descent direction for the residual norm. If for nothing else but the fact that it would leverage the AD system nicely.
One would need to compute the gradient of the residual norm with reverse mode AD but, even though more expensive then just checking the residual, it would make things play nicely with linesearch. (If you don't make any of such checks, I am guessing you could run into trouble if you want to combine Jacobian reuse with linesearch?)
Only for testing this idea, which might be necessary for Jacobian reuse if $\Delta x$ is not a decent direction. So I thought we'd need to compare $\Delta x$ (with the reused Jacobian) to $\frac{d ||f(x,p)||} {dx}$.
ahh yeah, the ODE one purposely changes away from differentiating the norm, which isn't what you want here.
@ChrisRackauckas: @yonatanwesen, @FHoltorf, and I were talking again about a good example today. Was there any specific ODE system for which the reuse strategy in https://github.com/SciML/OrdinaryDiffEq.jl/blob/master/src/nlsolve/nlsolve.jl#L63-L96 was particularly good? (To have another starting point.)
Anything sufficiently large. Bruss with N=32 should do.
https://github.com/SciML/SciMLBenchmarks.jl/pull/796 has a benchmarking script. Drop the sundials and minpack versions and you should be able to test quite rapidly
https://sundials.readthedocs.io/en/latest/kinsol/Mathematics_link.html#jacobian-information-update-strategy sundials also reuses jacobian
though their scheme might not be ideal, given how long it takes for bruss
The results from benchmarking what we currently we have on 2d brusselator problem doesn't show any significant improvement with the jacobian reuse.
what's the cost breakdown like?
what's the cost breakdown like?
Do you mean the cost when I profile it?
Yes share some flamegraphs https://github.com/tkluck/StatProfilerHTML.jl
Yes share some flamegraphs https://github.com/tkluck/StatProfilerHTML.jl
Check the stats and trace to see how many times the factorization is reused
https://github.com/SciML/NonlinearSolve.jl/pull/345 supercedes this
TODOs: