Open oameye opened 8 months ago
It depends heavily on the system. For a Kerr Parametric Oscillator (KPO), ModelingToolkit is faster again. Likely because OrdinaryDiffEq does some optimization for the Maier-Stein model, something it cannot seem to do for a KPO.
using ModelingToolkit, OrdinaryDiffEq, BenchmarkTools, StaticArrays
γ, α, ω₀, ω, λ = 0.01, 1.0, 1.0, 1.0, 0.6
@variables t u(t) v(t)
D = Differential(t)
eqs = [D(u) ~ -γ * u / 2 - (3 * α * (u^2 + v^2) / 8 + (ω₀^2 - ω^2) / 2 + λ / 4) * v / ω,
D(v) ~ -γ * v / 2 + (3 * α * (u^2 + v^2) / 8 + (ω₀^2 - ω^2) / 2 - λ / 4) * u / ω]
@named sys = ODESystem(eqs)
sys = structural_simplify(sys)
u0 = [u => 0.2,
v => 0.3]
function KPO!(dx, x, p, t) # in-place
u, v = x
dx[1] = -γ * u / 2 - (3 * α * (u^2 + v^2) / 8 + (ω₀^2 - ω^2) / 2 + λ / 4) * v / ω
dx[2] = -γ * v / 2 + (3 * α * (u^2 + v^2) / 8 + (ω₀^2 - ω^2) / 2 - λ / 4) * u / ω
end
function KPO(x, p, t) # out-of-place
u, v = x
du = -γ * u / 2 - (3 * α * (u^2 + v^2) / 8 + (ω₀^2 - ω^2) / 2 + λ / 4) * v / ω
dv = -γ * v / 2 + (3 * α * (u^2 + v^2) / 8 + (ω₀^2 - ω^2) / 2 - λ / 4) * u / ω
SA[du, dv]
end
tspan = (0.0, 10_000.0)
prob_od = ODEProblem{false}(KPO, SVector{2}(getindex.(u0, 2)), tspan, ())
prob_od! = ODEProblem{true}(KPO!, getindex.(u0, 2), tspan, ())
prob_mt_jac = ODEProblem{false}(sys, u0, tspan, []; jac=true, u0_constructor=x -> SVector(x...))
prob_mt_jac! = ODEProblem{true}(sys, u0, tspan, []; jac=true)
prob_mt = ODEProblem{false}(sys, u0, tspan, []; jac=false, u0_constructor=x -> SVector(x...))
prob_mt! = ODEProblem{true}(sys, u0, tspan, []; jac=false)
@btime solve($prob_od, Tsit5(), saveat=5); # 14.263 ms (616018 allocations: 12.20 MiB)
@btime solve($prob_od!, Tsit5(), saveat=5); # 6.766 ms (398880 allocations: 6.31 MiB)
@btime solve($prob_mt_jac, Tsit5(), saveat=5); # 595.200 μs (27 allocations: 119.72 KiB)
@btime solve($prob_mt_jac!, Tsit5(), saveat=5); # 694.200 μs (2045 allocations: 261.81 KiB)
@btime solve($prob_mt, Tsit5(), saveat=5); # 598.000 μs (27 allocations: 119.72 KiB)
@btime solve($prob_mt!, Tsit5(), saveat=5); # 766.100 μs (2055 allocations: 262.78 KiB)
Symbolics doesn't specialize the codegen of StaticArrays appropriately. This is something I've surfaced a few times but it wasn't fixed yet. @AayushSabharwal can you have a look when you can? Effectively, you want to not do any of the extra conversions that it currently does, the compiler does not elide them.
@AayushSabharwal bump on this, looking at static array codegen.
I've found something weird. This is the out-of-place function from prob_mt_jac
:
julia> prob_mt_jac.f.f.f_oop
RuntimeGeneratedFunction(#=in ModelingToolkit=#, #=using ModelingToolkit=#, :((ˍ₋arg1, t)->begin
#= /Users/aayush/.julia/packages/SymbolicUtils/0opve/src/code.jl:373 =#
#= /Users/aayush/.julia/packages/SymbolicUtils/0opve/src/code.jl:374 =#
#= /Users/aayush/.julia/packages/SymbolicUtils/0opve/src/code.jl:375 =#
begin
begin
begin
#= /Users/aayush/.julia/packages/SymbolicUtils/0opve/src/code.jl:468 =#
(SymbolicUtils.Code.create_array)(typeof(ˍ₋arg1), nothing, Val{1}(), Val{(2,)}(), (+)((+)(ˍ₋arg1[1], (*)(-1, (^)(ˍ₋arg1[1], 3))), (*)((*)(-10, ˍ₋arg1[1]), (^)(ˍ₋arg1[2], 2))), (*)((+)(-1, (*)(-1, (^)(ˍ₋arg1[1], 2))), ˍ₋arg1[2]))
end
end
end
end))
If I copy this expression and benchmark it (replacing arg1
with x
)
julia> bar = (x, t)->begin
(SymbolicUtils.Code.create_array)(typeof(x), nothing, Val{1}(), Val{(2,)}(), (+)((+)(x[1], (*)(-1, (^)(x[1], 3))), (*)((*)(-10, x[1]), (^)(x[2], 2))), (*)((+)(-1, (*)(-1, (^)(x[1], 2))), x[2]))
end
julia> @be (su, 0.0) sum(bar(_...))
Benchmark: 2579 samples with 343 evaluations
min 38.749 ns (4.02 allocs: 96.560 bytes)
median 65.720 ns (4.02 allocs: 96.560 bytes)
mean 100.692 ns (4.02 allocs: 96.560 bytes, 0.04% gc time)
max 64.061 μs (4.02 allocs: 96.560 bytes, 99.77% gc time)
Which allocates. meier_stein
doesn't:
julia> @be (su, nothing, 0.0) sum(meier_stein(_...))
Benchmark: 4754 samples with 7305 evaluations
min 1.865 ns
median 2.453 ns
mean 2.636 ns
max 16.524 ns
However, if I put this expression into a function
instead of a closure and run the benchmark
julia> function foo(x, t)
(SymbolicUtils.Code.create_array)(typeof(x), nothing, Val{1}(), Val{(2,)}(), (+)((+)(x[1], (*)(-1, (^)(x[1], 3))), (*)((*)(-10, x[1]), (^)(x[2], 2))), (*)((+)(-1, (*)(-1, (^)(x[1], 2))), x[2]))
end
julia> @be (su, 0.0) sum(foo(_...))
Benchmark: 4732 samples with 8806 evaluations
min 1.864 ns
median 1.878 ns
mean 2.203 ns
max 10.746 ns
It is just as fast as meier_stein
.
So far, this is the only explanation I've found for this behavior. Even reducing bar
to something practically identical to meier_stein
suffers slowdowns, but foo
doesn't:
julia> bar = (₋arg1, t)->begin
u, v = ₋arg1
du = (+)(u, - (u ^ 3), (*)(-10, u, v ^ 2));
dv= (*)(-1, (+)(1, (^)(u, 2)), v);
return SA[du, dv]
end
julia> @be (su, 0.0) sum(bar(_...))
Benchmark: 2581 samples with 250 evaluations
min 55.500 ns (4.03 allocs: 96.768 bytes)
median 91.332 ns (4.03 allocs: 96.768 bytes)
mean 126.129 ns (4.03 allocs: 96.768 bytes)
max 17.554 μs (4.03 allocs: 96.768 bytes)
For reference, the @code_lowered
outputs for all three:
julia> @code_lowered foo(su, nothing, 0.0)
CodeInfo(
1 ─ %1 = Base.getproperty(Main.SymbolicUtils, :Code)
│ %2 = Base.getproperty(%1, :create_array)
│ %3 = Main.typeof(x)
│ %4 = Main.nothing
│ %5 = Core.apply_type(Main.Val, 1)
│ %6 = (%5)()
│ %7 = Main.Val
│ %8 = Core.tuple(2)
│ %9 = Core.apply_type(%7, %8)
│ %10 = (%9)()
│ %11 = Base.getindex(x, 1)
│ %12 = Main.:^
│ %13 = Base.getindex(x, 1)
│ %14 = Core.apply_type(Base.Val, 3)
│ %15 = (%14)()
│ %16 = Base.literal_pow(%12, %13, %15)
│ %17 = -1 * %16
│ %18 = %11 + %17
│ %19 = Base.getindex(x, 1)
│ %20 = -10 * %19
│ %21 = Main.:^
│ %22 = Base.getindex(x, 2)
│ %23 = Core.apply_type(Base.Val, 2)
│ %24 = (%23)()
│ %25 = Base.literal_pow(%21, %22, %24)
│ %26 = %20 * %25
│ %27 = %18 + %26
│ %28 = Main.:^
│ %29 = Base.getindex(x, 1)
│ %30 = Core.apply_type(Base.Val, 2)
│ %31 = (%30)()
│ %32 = Base.literal_pow(%28, %29, %31)
│ %33 = -1 * %32
│ %34 = -1 + %33
│ %35 = Base.getindex(x, 2)
│ %36 = %34 * %35
│ %37 = (%2)(%3, %4, %6, %10, %27, %36)
└── return %37
)
julia> @code_lowered bar(su, 0.0)
CodeInfo(
1 ─ %1 = Base.indexed_iterate(₋arg1, 1)
│ u = Core.getfield(%1, 1)
│ @_4 = Core.getfield(%1, 2)
│ %4 = Base.indexed_iterate(₋arg1, 2, @_4)
│ v = Core.getfield(%4, 1)
│ %6 = u
│ %7 = Main.:^
│ %8 = u
│ %9 = Core.apply_type(Base.Val, 3)
│ %10 = (%9)()
│ %11 = Base.literal_pow(%7, %8, %10)
│ %12 = -%11
│ %13 = u
│ %14 = Main.:^
│ %15 = v
│ %16 = Core.apply_type(Base.Val, 2)
│ %17 = (%16)()
│ %18 = Base.literal_pow(%14, %15, %17)
│ %19 = -10 * %13 * %18
│ du = %6 + %12 + %19
│ %21 = Main.:^
│ %22 = u
│ %23 = Core.apply_type(Base.Val, 2)
│ %24 = (%23)()
│ %25 = Base.literal_pow(%21, %22, %24)
│ %26 = 1 + %25
│ dv = -1 * %26 * v
│ %28 = Base.getindex(Main.SA, du, dv)
└── return %28
julia> @code_lowered meier_stein(su, nothing, 0.0)
CodeInfo(
1 ─ %1 = Base.indexed_iterate(x, 1)
│ u = Core.getfield(%1, 1)
│ @_5 = Core.getfield(%1, 2)
│ %4 = Base.indexed_iterate(x, 2, @_5)
│ v = Core.getfield(%4, 1)
│ %6 = u
│ %7 = Main.:^
│ %8 = u
│ %9 = Core.apply_type(Base.Val, 3)
│ %10 = (%9)()
│ %11 = Base.literal_pow(%7, %8, %10)
│ %12 = %6 - %11
│ %13 = u
│ %14 = Main.:^
│ %15 = v
│ %16 = Core.apply_type(Base.Val, 2)
│ %17 = (%16)()
│ %18 = Base.literal_pow(%14, %15, %17)
│ %19 = 10 * %13 * %18
│ du = %12 - %19
│ %21 = Main.:^
│ %22 = u
│ %23 = Core.apply_type(Base.Val, 2)
│ %24 = (%23)()
│ %25 = Base.literal_pow(%21, %22, %24)
│ %26 = 1 + %25
│ %27 = -%26
│ dv = %27 * v
│ %29 = Base.getindex(Main.SA, du, dv)
└── return %29
)
They are practically identical
Even weirder:
julia> expr = :(function foo(x, t)
(SymbolicUtils.Code.create_array)(typeof(x), nothing, Val{1}(), Val{(2,)}(), (+)((+)(x[1], (*)(-1, (^)(x[1], 3))), (*)((*)(-10, x[1]), (^)(x[2], 2))), (*)((+)(-1, (*)(-1, (^)(x[1], 2))), x[2]))
end)
julia> @be (su, nothing, 0.0) sum(meier_stein(_...))
Benchmark: 4335 samples with 8773 evaluations
min 1.867 ns
median 2.341 ns
mean 2.430 ns
max 9.456 ns
julia> fn = @RuntimeGeneratedFunction(expr)
julia> @be (su, 0.0) fn(_...)
Benchmark: 2525 samples with 473 evaluations
min 30.302 ns (3.02 allocs: 80.474 bytes)
median 50.651 ns (3.02 allocs: 80.474 bytes)
mean 75.703 ns (3.02 allocs: 80.474 bytes)
max 13.093 μs (3.02 allocs: 80.474 bytes)
julia> @be (@RuntimeGeneratedFunction(expr), su, 0.0) _[1](_[2], _[3])
Benchmark: 767 samples with 8699 evaluations
min 1.868 ns
median 2.381 ns
mean 2.513 ns
max 12.061 ns
Question❓ When doing a basic benchmark for a Meier-Stein model, I find that giving the bare function to OrdinaryDiffEq is faster than doing it through ModelingToolkit. However, should the ModelingToolkit version not be more performant as it has acces to the Jacobian? Is the Symbolic layer of ModelingToolkit a bottleneck for the solver? Thank you for all the help you can give me to understand the result.