Scalarizing equations and performance - can I just not

Snowd1n commented 1 month ago

This is a question I expect the answer to be no too, so that's fine. In my system I have a lot of vector variables ~ 500 values in length. The equations often use at least 4 of these vectors to generate a new vector. To get the simplifaction to work I am having to do scalarize(eq) on each equation but this creates thousands of equations and inside of each thousands of 'pseudo-parameters' (not sure how scalarize works but when I print it it says stuff like vec[1],vec[2] inside of the function call). A lot of these equations are containing registered components which I know is an issue form other posts.

All of this seems to lead to a large amount of looping over thousands of equations and variables (alias_elimination!() takes forever in the structural_simplify) and I wondered if there was a way to write the equations without scalarizing so that there is 5 vector variables and 10 equations rather than thousands of each making the simplifcation take a reasonable amount of time as at the moment it's slower than the solve call

I hope someone can say yes but I am assuming it's a no. An MWE is below where the final output is 10 equations and 10 unknowns rather than 1 and 1.

@mtkmodel test begin
    @variables begin
        vec(t)[1:10]
    end
    @parameters begin
        a
        b
        c
    end
    @equations begin
        D(vec) ~ registered_func(vec,a,b,c).+t
    end
end

function registered_func(vec,a,b,c)
    return (vec.+c)*a/b
end
@register_array_symbolic registered_func(vec::AbstractVector,a::Num,b::Num,c::Num) begin
    size=(length(vec),)
    eltype=eltype(vec)
end

@mtkbuild sys = test()

Snowd1n commented 1 month ago

To add to this, when I tried to do a scaling test on a simple example from your documentation (see below) the scaling is seemingly quadratic in both time and memory. Is there any way to reduce that at all as the scaling makes the current method prohibitive to be used for vectors over length 100. I have seen your incredible benchmarks for 1000's of equations so I know you must have ran systems with massive equation sets before so I wonder if there is a way to get performance on a similar vain with vectors as I assume you didn't wait hours for your massive systems to simplify.

using ModelingToolkit, Plots, DifferentialEquations, LinearAlgebra
using ModelingToolkit: t_nounits as t, D_nounits as D
using Symbolics: scalarize

function Mass(len; name, m = 1.0)
    ps = @parameters m = m
    sts = @variables pos(t)[1:len] v(t)[1:len]
    eqs = scalarize(D.(pos) .~ v)
    ODESystem(eqs, t, [pos..., v...], ps; name)
end

function Spring(len; name, k = 1e4, l = 1.0)
    ps = @parameters k=k l=l
    @variables x(t), dir(t)[1:len]
    ODESystem(Equation[], t, [x, dir...], ps; name)
end

function connect_spring(spring, a, b)
    [spring.x ~ norm(scalarize(a .- b))
     scalarize(spring.dir .~ scalarize(a .- b))]
end

function spring_force(spring)
    -spring.k .* scalarize(spring.dir) .* (spring.x - spring.l) ./ spring.x
end

len=1000

m = 1.0
k = 1e4
l = 1.0
center = zeros(len)
g = ones(len)

@named mass = Mass(len)
@named spring = Spring(len,k = k, l = l)

eqs = [connect_spring(spring, mass.pos, center)
       scalarize(D.(mass.v) .~ spring_force(spring) / mass.m .+ g)]

@named _model = ODESystem(eqs, t, [spring.x; spring.dir; mass.pos], [])
@named model = compose(_model, mass, spring)
@btime sys = structural_simplify(model)

vec_benchmark

ChrisRackauckas commented 1 month ago

ODESystem(eqs, t, [pos..., v...], ps; name)

Don't splat.

Also, did you try JuliaSimCompiler with this?

Snowd1n commented 1 month ago

I have removed the splats from the vectors and tried with JuliaSimCompiler (local version) as per the instructions on getting started and the results are a slight improvement with JuliaSimCompiler in speed but no real improvement in memory.

Standard Compiler (No Splats) Vector Length ; Time / ms ; Memory / MB N = 10, 20.6, 6.53 N = 100, 669.4, 186.72 N = 1000, 57122, 21730

JuliaSimCompiler No Splats) Vector Length ; Time / ms ; Memory / MB N = 10, 17.2, 6.53 N = 100, 598.7, 186.72 N = 1000, 48154, 21730

ChrisRackauckas commented 1 month ago

Interesting. At 100, can you show a flame graph? Where is it all at?

Snowd1n commented 1 month ago

I have removed the splats from the vectors and tried with JuliaSimCompiler (local version) as per the instructions on getting started and the results are a slight improvement with JuliaSimCompiler in speed but no real improvement in memory.

Standard Compiler (No Splats) Vector Length ; Time / ms ; Memory / MB N = 10, 20.6, 6.53 N = 100, 669.4, 186.72 N = 1000, 57122, 21730

JuliaSimCompiler No Splats) Vector Length ; Time / ms ; Memory / MB N = 10, 5.814, 2.30 N = 100, 68.7, 25.36 N = 1000, 12266, 756.60

I messed up for the JuliaSimCompiler so here is the updated values. It's a much greater improvement

Snowd1n commented 1 month ago

Deepycopy.jl, _deepcopy_array_t and dae_transfomrations.jl, #structural_simplify!# are the two areas it spends the most time it seems, I have attached a zip of the flamegraph .jlprof file as I can't attach that directly. N=100 Flamegraph.zip

ChrisRackauckas commented 2 weeks ago

@AayushSabharwal does your CSE handle this case?

AayushSabharwal commented 2 weeks ago

No. The CSE doesn't help with structural_simplify time. It also doesn't do the CSE in the equations of the system (only observed). I can make it do the CSE for equations(sys) so it doesn't call registered_func multiple times but that's about it.

AayushSabharwal commented 2 weeks ago

CSE now handles this case

SciML / ModelingToolkit.jl

Scalarizing equations and performance - can I just not #3054