Integrating a state of SMatrix does has performance regressions

Datseris commented 6 years ago

MWE:

using StaticArrays, BenchmarkTools, OrdinaryDiffEq
ALG = Tsit5()

@inline @inbounds function loop(u, p, t)
    σ = p[1]; ρ = p[2]; β = p[3]
    du1 = σ*(u[2]-u[1])
    du2 = u[1]*(ρ-u[3]) - u[2]
    du3 = u[1]*u[2] - β*u[3]
    return SVector{3}(du1, du2, du3)
end
# f by itself does not do set state
function f(u, p, t)
    v1 = loop(u[:, 1], p, t)
    v2 = loop(u[:, 2], p, t)
    v3 = loop(u[:, 3], p, t)
    return hcat(v1, v2, v3)
end

println("Calling f")
@btime f($(ones(SMatrix{3,3})), $[10.0, 28.0, 8/3], 0.0) # 0 allocs

prob1 = ODEProblem(f, ones(SMatrix{3,3}), (0.0, Inf), [10.0, 28.0, 8/3])

integ = init(prob1, ALG; maxiters = typemax(Int))

println("step! integ")
@btime step!($integ)

result:

Calling f
 12.316 ns (0 allocations: 0 bytes)

step! integ
 1.026 μs (8 allocations: 928 bytes)

step! should not allocate.
step! should take an average time of the amounts of times f is called in ALG plus a bit more. Let's say that for Tsit f is called 16 times, then I would not expect step! to not take much more than let's say 300 ns (to add around 50-100 ns more for the additions and multiplications with the coefficients).

Datseris commented 6 years ago

It didn't change anything when I tried the above in the nothing matters branch.

YingboMa commented 6 years ago

MWE:

julia> @btime muladd(dt,ff,up) setup = ( integ = init(prob1, Euler(), dt=0.1); dt=integ.dt; ff=integ.fsalfirst; up=integ.uprev );
  32.395 ns (1 allocation: 80 bytes)

julia> @btime muladd(dt,ff,up) setup = ( dt=rand(); ff=@SMatrix(rand(3,3)); up=@SMatrix(rand(3,3)) );
  2.438 ns (0 allocations: 0 bytes)

YingboMa commented 6 years ago

Even weirder.

julia> @btime muladd(dt,ff,up) setup = ( dt=rand(); ff=@SMatrix(rand(3,3)); up=@SMatrix(rand(3,3)) );
  2.539 ns (0 allocations: 0 bytes)

julia> @btime muladd(dt,ff,up) setup = ( dt=rand(); ff=@SMatrix(rand(3,3)); up=@SMatrix(rand(3,3)); integ = init(prob1, Euler(), dt=0.1) );
  46.789 ns (4 allocations: 256 bytes)

julia> @btime muladd(dt,ff,up) setup = ( dt=rand(); ff=@SMatrix(rand(3,3)); up=@SMatrix(rand(3,3)); integ = Ref(Vector{Float64}(undef, 5000)));
  2.728 ns (0 allocations: 0 bytes)

ChrisRackauckas commented 6 years ago

...

Datseris commented 6 years ago

I don't understand... why would muladd depend on whether you create an integrator or not? In the second case dt, ff, up are all independent from the integrator. I don't even understand what init is doing there in the first place :D

ChrisRackauckas commented 6 years ago

It does nothing, it does absolutely nothing to the code that is being benchmarked. Except of course changing the timing.

YingboMa commented 6 years ago

Also the allocation.

Datseris commented 6 years ago

Man I am sorry I always lead you to finding these gems :D

ChrisRackauckas commented 6 years ago

This happened before with random numbers in Optim. Let me find the issue...

ChrisRackauckas commented 6 years ago

@andyferris or @c42f, do you know of a plausible explanation for this?

c42f commented 6 years ago

Don't know right at the moment. The only thing I can say (without digging into it in detail) is that the compiler is very clever at optimizing away microbenchmarks partially or completely, and I expect that's the result of the weird @btime results.

My "solution" is always to write my own benchmark loop (I don't trust @btime to sufficiently fool the compiler yet), carefully examine the generated code, and add cheap side effects until the optimizer can no longer remove the parts you want to time.

SciML / OrdinaryDiffEq.jl

Integrating a state of SMatrix does has performance regressions #559