SciML / OrdinaryDiffEq.jl

High performance ordinary differential equation (ODE) and differential-algebraic equation (DAE) solvers, including neural ordinary differential equations (neural ODEs) and scientific machine learning (SciML)
https://diffeq.sciml.ai/latest/
Other
556 stars 210 forks source link

Integrating a state of SMatrix does has performance regressions #559

Open Datseris opened 6 years ago

Datseris commented 6 years ago

MWE:

using StaticArrays, BenchmarkTools, OrdinaryDiffEq
ALG = Tsit5()

@inline @inbounds function loop(u, p, t)
    σ = p[1]; ρ = p[2]; β = p[3]
    du1 = σ*(u[2]-u[1])
    du2 = u[1]*(ρ-u[3]) - u[2]
    du3 = u[1]*u[2] - β*u[3]
    return SVector{3}(du1, du2, du3)
end
# f by itself does not do set state
function f(u, p, t)
    v1 = loop(u[:, 1], p, t)
    v2 = loop(u[:, 2], p, t)
    v3 = loop(u[:, 3], p, t)
    return hcat(v1, v2, v3)
end

println("Calling f")
@btime f($(ones(SMatrix{3,3})), $[10.0, 28.0, 8/3], 0.0) # 0 allocs

prob1 = ODEProblem(f, ones(SMatrix{3,3}), (0.0, Inf), [10.0, 28.0, 8/3])

integ = init(prob1, ALG; maxiters = typemax(Int))

println("step! integ")
@btime step!($integ)

result:

Calling f
 12.316 ns (0 allocations: 0 bytes)

step! integ
 1.026 μs (8 allocations: 928 bytes)
  1. step! should not allocate.
  2. step! should take an average time of the amounts of times f is called in ALG plus a bit more. Let's say that for Tsit f is called 16 times, then I would not expect step! to not take much more than let's say 300 ns (to add around 50-100 ns more for the additions and multiplications with the coefficients).
Datseris commented 6 years ago

It didn't change anything when I tried the above in the nothing matters branch.

YingboMa commented 6 years ago

MWE:

julia> @btime muladd(dt,ff,up) setup = ( integ = init(prob1, Euler(), dt=0.1); dt=integ.dt; ff=integ.fsalfirst; up=integ.uprev );
  32.395 ns (1 allocation: 80 bytes)

julia> @btime muladd(dt,ff,up) setup = ( dt=rand(); ff=@SMatrix(rand(3,3)); up=@SMatrix(rand(3,3)) );
  2.438 ns (0 allocations: 0 bytes)
YingboMa commented 6 years ago

Even weirder.

julia> @btime muladd(dt,ff,up) setup = ( dt=rand(); ff=@SMatrix(rand(3,3)); up=@SMatrix(rand(3,3)) );
  2.539 ns (0 allocations: 0 bytes)

julia> @btime muladd(dt,ff,up) setup = ( dt=rand(); ff=@SMatrix(rand(3,3)); up=@SMatrix(rand(3,3)); integ = init(prob1, Euler(), dt=0.1) );
  46.789 ns (4 allocations: 256 bytes)

julia> @btime muladd(dt,ff,up) setup = ( dt=rand(); ff=@SMatrix(rand(3,3)); up=@SMatrix(rand(3,3)); integ = Ref(Vector{Float64}(undef, 5000)));
  2.728 ns (0 allocations: 0 bytes)
ChrisRackauckas commented 6 years ago

...

Datseris commented 6 years ago

I don't understand... why would muladd depend on whether you create an integrator or not? In the second case dt, ff, up are all independent from the integrator. I don't even understand what init is doing there in the first place :D

ChrisRackauckas commented 6 years ago

It does nothing, it does absolutely nothing to the code that is being benchmarked. Except of course changing the timing.

YingboMa commented 6 years ago

Also the allocation.

Datseris commented 6 years ago

Man I am sorry I always lead you to finding these gems :D

ChrisRackauckas commented 6 years ago

This happened before with random numbers in Optim. Let me find the issue...

ChrisRackauckas commented 6 years ago

@andyferris or @c42f, do you know of a plausible explanation for this?

c42f commented 6 years ago

Don't know right at the moment. The only thing I can say (without digging into it in detail) is that the compiler is very clever at optimizing away microbenchmarks partially or completely, and I expect that's the result of the weird @btime results.

My "solution" is always to write my own benchmark loop (I don't trust @btime to sufficiently fool the compiler yet), carefully examine the generated code, and add cheap side effects until the optimizer can no longer remove the parts you want to time.