Closed DhairyaLGandhi closed 2 years ago
ref #30
(rebased this so it has the benchmarks)
Ran pretty exhaustive versions of just the fit_overpotential
benchmarks on this like so (only on the relevant sections of the suite, set time_mult=5
):
# on this branch...
using PkgJogger, BenchmarkTools
@jog ElectrochemicalKinetics
suite = JogElectrochemicalKinetics.suite()[@tagged "overpotential fitting" && "AD"]
tune!(suite)
results_fj = run(suite)
# then switch to master branch...
suite2 = JogElectrochemicalKinetics.suite()[@tagged "overpotential fitting" && "AD"]
loadparams!(suite2, params(suite)) # so we don't have to retune
results_current = run(suite2, verbose=true)
and then compared them:
julia> PkgJogger.judge(results_fj, results_current)
1-element BenchmarkTools.BenchmarkGroup:
tags: []
"bench_suite.jl" => 1-element BenchmarkTools.BenchmarkGroup:
tags: []
"overpotential fitting" => 2-element BenchmarkTools.BenchmarkGroup:
tags: []
"non-integral models" => 1-element BenchmarkTools.BenchmarkGroup:
tags: []
"AD" => 3-element BenchmarkTools.BenchmarkGroup:
tags: []
"ElectrochemicalKinetics.Marcus" => TrialJudgement(+176.12% => regression)
"ElectrochemicalKinetics.AsymptoticMarcusHushChidsey" => TrialJudgement(+2.39% => invariant)
"ElectrochemicalKinetics.ButlerVolmer" => TrialJudgement(+109.36% => regression)
"integral models" => 1-element BenchmarkTools.BenchmarkGroup:
tags: []
"AD" => 2-element BenchmarkTools.BenchmarkGroup:
tags: []
"MarcusHushChidseyDOS: Cu111" => TrialJudgement(+6.68% => regression)
"MarcusHushChidseyDOS: flat dos" => TrialJudgement(+7.79% => regression)
...so I want to dig into this more before merging, e.g. actually looking at solve traces etc. Will hopefully have time to start playing with that tomorrow.
Interesting. And this is just with the changes in the PR which should technically not change the code paths at all. What about running the fit_overpotential
manually? Does it reflect the same regression?
I think I know what's happening. Its the case of NLsolve forcing us to use the vector case by default even for a real input.
On master we have https://github.com/BattModels/ElectrochemicalKinetics.jl/blob/9ade0a92769586ba5350584f11ccaf01c6eed799/src/fitting.jl#L25 to get around this, but this is going to be wrong for vector inputs. In the PR, it is generic.
Accounting for that by applying the same patch to this PR gives:
# PR
julia> @btime fit_overpotential($bv, $(I_vals[1]), $(fill(0.1, 1)), verbose = false);
40.149 μs (736 allocations: 104.19 KiB)
# master
julia> @btime fit_overpotential($bv, $(I_vals[1]), $(fill(0.1, 1)), verbose = false);
109.627 μs (1302 allocations: 151.39 KiB)
Here is the diff I am applying
diff --git a/src/fitting.jl b/src/fitting.jl
index 2c24d23..f843eaa 100644
--- a/src/fitting.jl
+++ b/src/fitting.jl
@@ -25,14 +25,14 @@ function fit_overpotential(model::KineticModel, k, guess = fill(0.1, length(k));
F .= loss(k, compute_k(V, model; kT = kT, kwargs...))
elseif F == nothing && !isnothing(J)
# TODO: we could speed up the case of scalar k's by dispatching to call gradient here instead
- gs = Zygote.gradient(V) do V
+ gs = Zygote.gradient(V[1]) do V
Zygote.forwarddiff(V) do V
loss(k, compute_k(V, model; kT = kT, kwargs...)) |> sum
end
end[1]
J .= gs
else
- y, back = Zygote.pullback(V) do V
+ y, back = Zygote.pullback(V[1]) do V
Zygote.forwarddiff(V) do V
loss(k, compute_k(V, model; kT = kT, kwargs...)) |> sum
end
I am slightly worried that cached code can still pollute the results. Further, it seems to only do 1 eval for some reason. I noticed that PkgJogger.jl allows saving benchmarks which means we can avoid the slightly opaque issue of cached functions. However, that would mean that we can no longer make use of Revise effectively and have to pay tuning price (which tbf should be done separately for every benchmarking suite anyway. Finally these files can be passed to judge
directly. Could we try that method once please?
it seems to only do 1 eval for some reason.
I was confused by that too, but that's actually the default – it's doing at least hundreds of samples within that eval with the timings I've set, so I think that should be fine.
Could we try that method once please?
For sure. For my own sanity, I tried the way I had been doing it but applying the diff you showed on your branch. Results there (I've edited the outputs for readability):
julia> PkgJogger.judge(results_fj, results_master)
"ElectrochemicalKinetics.Marcus" => TrialJudgement(+170.24% => regression)
"ElectrochemicalKinetics.AsymptoticMarcusHushChidsey" => TrialJudgement(+184.76% => regression)
"ElectrochemicalKinetics.ButlerVolmer" => TrialJudgement(+163.78% => regression)
"MarcusHushChidseyDOS: Cu111" => TrialJudgement(-4.42% => invariant)
"MarcusHushChidseyDOS: flat dos" => TrialJudgement(-3.78% => invariant)
So that would seem to suggest there is some caching or something happening.
So then I ran with new REPL sessions and separate re-tuning for each branch:
julia> PkgJogger.judge("../results_fj.json.gz", "../results_master.json.gz")
"ElectrochemicalKinetics.Marcus" => TrialJudgement(-0.13% => invariant)
"ElectrochemicalKinetics.AsymptoticMarcusHushChidsey" => TrialJudgement(+0.21% => invariant)
"ElectrochemicalKinetics.ButlerVolmer" => TrialJudgement(+0.29% => invariant)
"MarcusHushChidseyDOS: Cu111" => TrialJudgement(-2.19% => invariant)
"MarcusHushChidseyDOS: flat dos" => TrialJudgement(-1.77% => invariant)
Not really sure what the deal is here...is doing "restart REPL" in VS Code not enough to totally clear things? Like, here I almost wonder if it's running exactly the same thing over...I guess I could actually quit out of VS Code between as well, I haven't tried that...
Anyway, for a third check, I (again in separate REPL's) explicitly ran the @benchmark
command on these fit_overpotential
expressions. To save space, just reporting median times here, on this PR vs. on master:
Also, to be explicit about this, all these benchmarks are for fitting a single scalar value. Probably given what you note above, we'll eventually need separately dispatches for scalars and vectors, but that can be handled. I do want to add vector benchmarks for these eventually, obviously.
My inclination is to trust the last check and say we may as well merge this, and leave the broader benchmarking reproducibility puzzle for another time. Your thoughts?
I would trust the last ones too. There's at least a trend with the integral models, and the kinetic models are within my estimate of run to run variance for 2 of them which is expected.
could actually quit out of VS Code between as well
I doubt that's necessary.
leave the broader benchmarking reproducibility puzzle for another time
Agree
This avoids the calculation of the primal in the case we want both the primal and the gradients. Helpful for
IntegralModel
where the calculation of the primal can be expensive.