JuliaControl / ControlSystems.jl

A Control Systems Toolbox for Julia
https://juliacontrol.github.io/ControlSystems.jl/stable/
Other
503 stars 85 forks source link

Allow to use threads when computing `freqresp!` #912

Closed VaiTon closed 6 months ago

baggepinnen commented 7 months ago

Hello and thank you for your PR!

Do you have any benchmark where running threaded is beneficial?

VaiTon commented 7 months ago

Yes :) Here they are.

Trial with $10^7$ freqs

Parallel

julia> trials.data["bode, parallel=true, unwrap=false, 10^7"]
BenchmarkTools.Trial: 3 samples with 1 evaluation.
 Range (min … max):  2.476 s …   2.497 s  ┊ GC (min … max): 0.10% … 0.09%
 Time  (median):     2.484 s              ┊ GC (median):    0.10%
 Time  (mean ± σ):   2.485 s ± 10.572 ms  ┊ GC (mean ± σ):  0.17% ± 0.12%

  █                     █                                 █  
  █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  2.48 s         Histogram: frequency by time         2.5 s <

 Memory estimate: 2.68 GiB, allocs estimate: 107.

Serial

julia> trials.data["bode, parallel=false, unwrap=false, 10^7"]
BenchmarkTools.Trial: 1 sample with 1 evaluation.
 Single result which took 6.128 s (0.03% GC) to evaluate,
 with a memory estimate of 2.68 GiB, over 6 allocations.

Time graph

bode_time

Memory graph

bode_memory

Code

begin
    using ControlSystemsBase
    using BenchmarkTools
    using Plots
    using BenchmarkPlots
    using StatsPlots
end

G = ssrand(3) |> tf

suite = BenchmarkGroup()

for i in 1:7
    w = LinRange(1e-5, 1e5, 10^i)

    for p in [true, false]
        b = @benchmarkable bode($G, $w; parallel=$p, unwrap=false)
        suite["bode, parallel=$p, unwrap=false, 10^$i"] = b
    end
end

tune!(suite; verbose=true)

trials = run(suite, verbose=true)

parallelseries = []
for i in 1:7
    push!(parallelseries, trials.data["bode, parallel=true, unwrap=false, 10^$i"])
end

serialseries = []
for i in 1:7
    push!(serialseries, trials.data["bode, parallel=false, unwrap=false, 10^$i"])
end

sizeseries = ["10^1", "10^2", "10^3", "10^4", "10^5", "10^6", "10^7"]

plot(sizeseries,
    [
        (t) -> mean(trials.data["bode, parallel=true, unwrap=false, $t"]).time,
        (t) -> mean(trials.data["bode, parallel=false, unwrap=false, $t"]).time,
    ],
    label=["parallel" "serial"], 
    xlabel="size of frequency vector",
    ylabel="time (μs)",
    legend=:topleft,
)
savefig("bode_time.png")

plot(
    sizeseries,
    [
        (t) -> mean(trials.data["bode, parallel=true, unwrap=false, $t"]).memory,
        (t) -> mean(trials.data["bode, parallel=false, unwrap=false, $t"]).memory,
    ],
    label=["parallel" "serial"],
    xlabel="size of frequency vector",
    ylabel="memory (bytes)",
    legend=:topleft,
)
savefig("bode_memory.png")
VaiTon commented 7 months ago

If you want to use a different interface please do. I'm not an expert in how APIs should be exposed in Julia :)

VaiTon commented 6 months ago

@baggepinnen any news on this?

baggepinnen commented 6 months ago

Thanks for the benchmarks!

I am not convinced this is the best optimization to perform for freqresp on transfer functions. With 12 threads, I see no more than 2x speedup for frequency vectors up to length 10k. I am also not quite sure the benchmark is entirely realistic, representing 3x3 MIMO systems of order 6 as transfer functions is questionable from a numeric standpoint, a statespace representation would probably make better sense for such a system?

If this benchmark is indeed representative of a workload you are running in practice where the call to freqresp is the computational bottleneck, I'd rather look into avoiding the repeated computations that take place for all the similar denominators. The system G in the benchmark evaluates the same denominator polynomial 6 times for each frequency.