Closed oscardssmith closed 3 days ago
This should get tested
agreed. As I said, I still need to do some performance testing.
Benchmarks:
using OrdinaryDiffEq
f(du, u, p, t) = du .= u
prob =ODEProblem(f, rand(1), (0.0, 1.0))
@btime solve(prob, Rosenbrock23())
# before
12.360 μs (126 allocations: 9.62 KiB)
#after
12.520 μs (126 allocations: 9.62 KiB)
@btime solve(prob, Vern7(lazy=false), abstol=1e-9, reltol=1e-9)
#before
12.200 μs (220 allocations: 16.52 KiB)
#after
12.330 μs (220 allocations: 16.52 KiB)
Seems to be well within the noise (especially considering that a single variable array is literally the worst case since then any overhead would be magnified.
That's using @btime
though. What's the compilation performance?
Fully noise. In a fresh REPL for each line of output: Before:
julia> @time solve(prob, Rosenbrock23())
1.691462 seconds (1.87 M allocations: 128.175 MiB, 1.51% gc time, 99.98% compilation time)
julia> @time solve(prob, Rosenbrock23());
1.747171 seconds (1.87 M allocations: 128.154 MiB, 1.83% gc time, 99.98% compilation time)
julia> @time solve(prob, Rosenbrock23());
1.843381 seconds (1.87 M allocations: 128.154 MiB, 4.54% gc time, 99.99% compilation time)
After:
julia> @time solve(prob, Rosenbrock23());
1.805668 seconds (2.35 M allocations: 159.500 MiB, 2.87% gc time, 99.99% compilation time)
julia> @time solve(prob, Rosenbrock23());
1.797002 seconds (2.35 M allocations: 159.529 MiB, 2.47% gc time, 99.99% compilation time)
julia> @time solve(prob, Rosenbrock23());
1.837029 seconds (2.35 M allocations: 159.529 MiB, 2.13% gc time, 99.98% compilation time)
Looks like that's actually measurable? But it's small enough now that we can do this. It used to be like a 10 second difference, I'm happy that's solved.
and also, the other reason why this PR clearly is needed is that it found a bug in Rosenbrock23
due to a previously untested path :)
With FastBroadcast.jl, we expect the only benefit of for loops over broadcasting to be in the macroexpand/lowering time (which makes this extra code a pure loading pesimization since lowering doesn't care about types so having both versions is strictly worse than having only 1). I need to do some tests to make sure this doesn't regress performance, but I'm pretty sure it won't.