JuliaSmoothOptimizers / PartitionedStructures.jl

Partitioned derivatives storage and partitioned quasi-Newton updates
Other
10 stars 1 forks source link

Method profiles #50

Open paraynaud opened 2 years ago

paraynaud commented 2 years ago

@dpo, the following script profile the PartitionedStructures.jl methods used by PartiallySeparableNLPModels.jl, PartitallySeparableSovers.jl and PartitionedKnetNLPModels.jl.

Summary:

See PartiallySeparableSolvers.jl method profiles to see PartitionedStructures.jl methods impact in a partitioned solver.

Remarks: The partitioned matrices using linear operators are much faster for update! and partitioned matrix-vector product, because the push! and mul! of LinearOperators are more efficient.

The script print the benchmark launched on every mehods:

using PartitionedStructures

using BenchmarkTools

using StatsBase
using ProfileSVG

path = pwd()*"/dvpt/profilage/profiles/"

N = 500
n = 2000
range_ni = 1:50

element_variables = map( i-> sample(1:n, rand(range_ni), replace = false), 1:N)

epv = create_epv(element_variables)
@code_warntype create_epv(element_variables; T=Float32) # all blue, when type is on

@code_warntype get_v(build_v!(epv)) #type stable
@benchmark get_v(build_v!(epv))
p = ProfileSVG.@profview (@benchmark get_v(build_v!(epv)))
ProfileSVG.save(path*"get_build_v.svg")
# BenchmarkTools.Trial: 10000 samples with 1 evaluation.
#  Range (min … max):  18.100 μs … 439.100 μs  ┊ GC (min … max): 0.00% … 0.00%
#  Time  (median):     20.100 μs               ┊ GC (median):    0.00%
#  Time  (mean ± σ):   23.691 μs ±  11.375 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

#   ▅▄█▆▄▃▂▂▁▁▁▂▃▁▁▁▁▁               ▁                           ▁
#   ████████████████████████▇▇▇▇▇█▆▆▇█▇▇▆▆▇▇▇▇█▇▆▇█▄▇▆▅▅▆▅▅▅▅▅▅▅ █
#   18.1 μs       Histogram: log(frequency) by time      58.2 μs <

#  Memory estimate: 0 bytes, allocs estimate: 0.

v = rand(n)
@code_warntype epv_from_v!(epv, v) # all blue, one yellow
@benchmark epv_from_v!(epv, v)
p = ProfileSVG.@profview (@benchmark epv_from_v!(epv, v))
ProfileSVG.save(path*"epv_from_v.svg")
# BenchmarkTools.Trial: 10000 samples with 1 evaluation.
#  Range (min … max):  40.200 μs …  3.124 ms  ┊ GC (min … max): 0.00% … 94.13%
#  Time  (median):     45.600 μs              ┊ GC (median):    0.00%
#  Time  (mean ± σ):   57.456 μs ± 89.910 μs  ┊ GC (mean ± σ):  7.59% ±  4.88%
#   ▄▇█▇▅▄▃▃▂▃▄▄▃▃▂▂▁▁▁▁▂▂▁  ▁                                  ▂
#   ███████████████████████████▇▇▆▇▇▇▆▆▆▇▅▄▅▆▇▅▆▅▆▆▆▆▇▅▆▆▅▅▄▄▅▅ █
#   40.2 μs      Histogram: log(frequency) by time       140 μs <
#  Memory estimate: 125.92 KiB, allocs estimate: 500.

# we must look closely to `ModElemental_pv.set_eev!` and `ModElemental_pv.set_vec!` to compare allocations with `.=` or with `=`
# I think `x[get_indices(get_eev_set(epv_x, idx))]` allocate an temporary array.

epv2 = similar(epv)
@code_warntype epv_from_epv!(epv2, epv) # not fully blue because of `full_check_epv_epm`
@benchmark epv_from_epv!(epv2, epv)
p = ProfileSVG.@profview (@benchmark epv_from_epv!(epv, epv2))
ProfileSVG.save(path*"epv_from_epv.svg")
# `full_check_epv_epm` take the majoriry of the time (at least 2/3)
# BenchmarkTools.Trial: 10000 samples with 1 evaluation.
#  Range (min … max):  31.000 μs … 289.300 μs  ┊ GC (min … max): 0.00% … 0.00%
#  Time  (median):     35.800 μs               ┊ GC (median):    0.00%
#  Time  (mean ± σ):   46.126 μs ±  21.648 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%
#   █▇▆▆▅▄▂▂▁▂▃▃▄▃▃▂ ▂▃▂▂▂▁▂▁ ▁▁▁▁▂▁  ▁▂ ▁  ▁                    ▂
#   █████████████████████████████████████████▇▇▇▇█▆▇▇▆▆▆▄▆▆▅▆▄▄▅ █
#   31 μs         Histogram: log(frequency) by time       122 μs <
#  Memory estimate: 0 bytes, allocs estimate: 0.

@code_warntype minus_epv!(epv2) # all blue
@benchmark minus_epv!(epv2)
p = ProfileSVG.@profview (@benchmark minus_epv!(epv2))
ProfileSVG.save(path*"minus_epv.svg")
# BenchmarkTools.Trial: 10000 samples with 1 evaluation.
#  Range (min … max):  29.100 μs …  1.946 ms  ┊ GC (min … max):  0.00% … 94.85%
#  Time  (median):     32.100 μs              ┊ GC (median):     0.00%
#  Time  (mean ± σ):   39.656 μs ± 86.093 μs  ┊ GC (mean ± σ):  11.32% ±  5.12%
#   ▁▅▇█▇▇▅▄▄▃▃▂▂▂▂▁ ▁▁▁▁   ▁▁▁▁▁▁                              ▂
#   ██████████████████████████████▇▇▇▆▇▇▆▆▇▇▆▇▇▇██▇▇█▇▇▇█▇▆▆▆▆▄ █
#   29.1 μs      Histogram: log(frequency) by time      67.9 μs <
#  Memory estimate: 129.98 KiB, allocs estimate: 501.
# Maybe improve 
# @inline set_minus_vec!(ev::T) where {T <: Elt_vec} = set_vec!(ev, -get_vec(ev))

@code_warntype add_epv!(epv, epv2) # not fully blue because of `full_check_epv_epm`
@benchmark add_epv!(epv, epv2) # Time 26.800 μs, Memory estimate: 0 bytes, allocs estimate: 0.
p = ProfileSVG.@profview (@benchmark add_epv!(epv, epv2))
ProfileSVG.save(path*"add_epv.svg")
# `full_check_epv_epm` take the majoriry of the time (at least 2/3)
# BenchmarkTools.Trial: 10000 samples with 1 evaluation.
#  Range (min … max):  32.600 μs … 128.100 μs  ┊ GC (min … max): 0.00% … 0.00%
#  Time  (median):     33.200 μs               ┊ GC (median):    0.00%
#  Time  (mean ± σ):   35.751 μs ±   8.122 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%
#   █▄▇ ▄ ▄                                                      ▁
#   ███████▇█▇▇▇▅▆▇▄▃▃▅▅▄▄▆▄▅▅▆▅▅▄▃▇▇▆▆▅▄▅▂▄▃▃▃▄█▇█▆▇▆▇▆▄▂▄▃▄▂▃▃ █
#   32.6 μs       Histogram: log(frequency) by time      70.7 μs <
#  Memory estimate: 0 bytes, allocs estimate: 0.

epm = epm_from_epv(epv)
eplosr1 = eplo_lsr1_from_epv(epv) 
eplose = eplo_lose_from_epv(epv) 

epv_y = similar(epv)
epv_s = similar(epv)
epv_from_v!(epv_y, rand(n) .- 0.5)
epv_from_v!(epv_s, rand(n) .- 0.5)

# one update to get element matrices different than identity
update!(epm, epv_y, epv_s; name=:pse)
update!(eplosr1, epv_y, epv_s)
update!(eplose, epv_y, epv_s)

# Elemental partitioned matrix
@code_warntype mul_epm_epv(epm, epv) # full blue
@benchmark mul_epm_epv(epm, epv)
p = ProfileSVG.@profview (@benchmark mul_epm_epv(epm, epv))
ProfileSVG.save(path*"mul_epm_epv.svg")
# BenchmarkTools.Trial: 386 samples with 1 evaluation.
#  Range (min … max):  11.640 ms …  20.748 ms  ┊ GC (min … max): 0.00% … 0.00%
#  Time  (median):     12.790 ms               ┊ GC (median):    0.00%
#  Time  (mean ± σ):   12.924 ms ± 926.861 μs  ┊ GC (mean ± σ):  0.61% ± 3.29%
#         ▁▄▅▆▇█▆▄▃▁▂
#   ▃▃▄▄▅▆███████████▅▄▅▄▂▁▃▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▂▁▁▂ ▃
#   11.6 ms         Histogram: frequency by time         17.8 ms <
#  Memory estimate: 847.02 KiB, allocs estimate: 6326.
# the calls to `set_eev!` are taking the majority of the time (2nd time involved)
# I think it might be due to `Bie * vie` inside `set_eev!`.

epv_res = similar(epv)
@code_warntype PartitionedStructures.mul_epm_epv!(epv_res, epm, epv) # full blue
@benchmark PartitionedStructures.mul_epm_epv!(epv_res, epm, epv)
p = ProfileSVG.@profview (@benchmark PartitionedStructures.mul_epm_epv!(epv_res, epm, epv))
ProfileSVG.save(path*"mul_epm_epv!.svg")
# BenchmarkTools.Trial: 389 samples with 1 evaluation.
#  Range (min … max):  10.918 ms … 38.102 ms  ┊ GC (min … max): 0.00% … 0.00%
#  Time  (median):     12.554 ms              ┊ GC (median):    0.00%
#  Time  (mean ± σ):   12.832 ms ±  1.704 ms  ┊ GC (mean ± σ):  0.04% ± 0.69%
#              ▂ ▅ █▃▇▅▆▂▂▄▁
#   ▂▁▃▃▂▃▄▄▅▇▇█▅█▅█████████▆▆▅▇▃▆▇▄▄▁▄▃▂▃▃▃▃▂▂▃▂▁▂▃▂▂▃▁▁▃▃▁▂▁▃ ▃
#   10.9 ms         Histogram: frequency by time          16 ms <
# Memory estimate: 125.92 KiB, allocs estimate: 500.

# PLSR1 operator
@code_warntype mul_epm_epv(eplosr1, epv) # full blue
@benchmark mul_epm_epv(eplosr1, epv)
p = ProfileSVG.@profview (@benchmark mul_epm_epv(eplosr1, epv))
ProfileSVG.save(path*"mul_eplosr1_epv.svg")
# BenchmarkTools.Trial: 7689 samples with 1 evaluation.
#  Range (min … max):  421.300 μs …   9.081 ms  ┊ GC (min … max):  0.00% … 82.23%
#  Time  (median):     482.300 μs               ┊ GC (median):     0.00%
#  Time  (mean ± σ):   645.372 μs ± 657.664 μs  ┊ GC (mean ± σ):  12.57% ± 11.56%
#   █▆▅▄▃▁   ▂▂▁                                                  ▁
#   ██████████████▇▇▆▆▆▅▅▃▃▄▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▄▅▆▆▆▆▅▆ █
#   421 μs        Histogram: log(frequency) by time       4.81 ms <
#  Memory estimate: 847.02 KiB, allocs estimate: 6326.
# the product between linear-operator vector is faster than matrix-vector  
# most of the time is takn by `similar(epv)` hidden in `mul_epm_epv`

@code_warntype PartitionedStructures.mul_epm_epv!(epv_res, eplosr1, epv) #  Bie::LinearOperators.LSR1Operator{Float64} en rouge
@benchmark PartitionedStructures.mul_epm_epv!(epv_res, eplosr1, epv)
p = ProfileSVG.@profview (@benchmark PartitionedStructures.mul_epm_epv!(epv_res, eplosr1, epv))
ProfileSVG.save(path*"mul_eplosr1_epv!.svg")
# BenchmarkTools.Trial: 10000 samples with 1 evaluation.
#  Range (min … max):  121.200 μs …  30.156 ms  ┊ GC (min … max): 0.00% … 0.00%
#  Time  (median):     204.650 μs               ┊ GC (median):    0.00%
#  Time  (mean ± σ):   271.456 μs ± 369.850 μs  ┊ GC (mean ± σ):  2.68% ± 4.57%
#   █            
#   █▅▅▄▄▃▃▃▃▃▂▂▂▂▂▂▂▄▂▃▃▃▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
#   121 μs           Histogram: frequency by time          887 μs <
#  Memory estimate: 125.92 KiB, allocs estimate: 500.
# `full_check_epv_epm` takes 1/4 of the time, the rest is `mul` applied on elements.

# PLSE operator
@code_warntype mul_epm_epv(eplose, epv) # full blue
@benchmark mul_epm_epv(eplose, epv)
p = ProfileSVG.@profview (@benchmark mul_epm_epv(eplose, epv))
ProfileSVG.save(path*"mul_eplose_epv.svg")
# most of the time is takn by `similar(epv)` hidden in `mul_epm_epv`
# BenchmarkTools.Trial: 6189 samples with 1 evaluation.
#  Range (min … max):  452.900 μs …  12.158 ms  ┊ GC (min … max):  0.00% … 82.97%
#  Time  (median):     533.200 μs               ┊ GC (median):     0.00%
#  Time  (mean ± σ):   800.565 μs ± 804.668 μs  ┊ GC (mean ± σ):  11.06% ± 11.36%
#   █▆▅▄▃▃▂▁▁▃▃▂▁▁▁ ▁▁                                            ▁
#   ████████████████████▇▇▆▆▆▆▅▅▅▅▄▁▁▅▁▄▁▄▁▃▃▃▁▃▁▁▁▁▁▁▁▃▃▄▆▅▄▇▆▆▆ █
#   453 μs        Histogram: log(frequency) by time          5 ms <
#  Memory estimate: 847.02 KiB, allocs estimate: 6326.

@code_warntype PartitionedStructures.mul_epm_epv!(epv_res, eplose, epv) #    Bie::Any, Bie*vie en rouge
@benchmark PartitionedStructures.mul_epm_epv!(epv_res, eplose, epv)
p = ProfileSVG.@profview (@benchmark PartitionedStructures.mul_epm_epv!(epv_res, eplose, epv))
ProfileSVG.save(path*"mul_eplose_epv!.svg")
# BenchmarkTools.Trial: 10000 samples with 1 evaluation.
#  Range (min … max):  147.900 μs …   4.393 ms  ┊ GC (min … max): 0.00% … 0.00%
#  Time  (median):     162.000 μs               ┊ GC (median):    0.00%
#  Time  (mean ± σ):   228.296 μs ± 182.990 μs  ┊ GC (mean ± σ):  2.60% ± 4.48%
#   █▆▄▄▄▃▃▂▂▂▂▁▁▁▁▁▁       ▁▁▂▁                                  ▁
#   █████████████████████▇███████████▇▇▇█▇▇▆▆▄▅▆▅▅▅▃▅▄▅▅▅▅▆▅▅▅▅▅▅ █
#   148 μs        Histogram: log(frequency) by time        765 μs <
#  Memory estimate: 125.92 KiB, allocs estimate: 500.
# `full_check_epv_epm` takes 1/4 of the time, the rest is `mul` applied on elements. 

# Updates
@code_warntype update!(epm, epv_y, epv_s; name=:pse) # full blue
@benchmark update!(epm, epv_y, epv_s; name=:pse, verbose=false)
p = ProfileSVG.@profview (@benchmark update!(epm, epv_y, epv_s; name=:pse, verbose=false))
ProfileSVG.save(path*"update_epm.svg")
# BenchmarkTools.Trial: 1788 samples with 1 evaluation.
#  Range (min … max):  1.805 ms … 10.378 ms  ┊ GC (min … max):  0.00% … 36.20%
#  Time  (median):     2.172 ms              ┊ GC (median):     0.00%
#  Time  (mean ± σ):   2.779 ms ±  1.204 ms  ┊ GC (mean ± σ):  13.42% ± 16.44%
#    █▅▁       
#   ████▆▅▄▄▃▃▃▃▂▂▂▃▂▃▃▃▃▄▄▃▃▃▃▃▃▃▂▃▂▂▂▂▂▂▁▂▂▁▂▂▂▂▂▂▂▂▂▂▁▁▂▁▂▂ ▃
#   1.8 ms         Histogram: frequency by time        7.58 ms <
#  Memory estimate: 8.73 MiB, allocs estimate: 2143.
# most of the computation time is about the `(y * y') ./ dot(y, s)`, `(Bs * Bs') ./ dot(Bs, s)` and `r = y .- B * s`, `B_1 .= B .+ terme1 .- terme2` and `B_1 .= B .+ ((r * r') ./ dot(s, r))`.
# These 5 lines take 85% of the `update!` time.

@code_warntype update!(eplosr1, epv_y, epv_s) # full blue
@benchmark update!(eplosr1, epv_y, epv_s; verbose=false)
p = ProfileSVG.@profview (@benchmark update!(eplosr1, epv_y, epv_s; verbose=false))
ProfileSVG.save(path*"update_eplosr1.svg")
# BenchmarkTools.Trial: 2697 samples with 1 evaluation.
#  Range (min … max):  800.700 μs …  11.132 ms  ┊ GC (min … max): 0.00% … 40.26%
#  Time  (median):       1.633 ms               ┊ GC (median):    0.00%
#  Time  (mean ± σ):     1.843 ms ± 890.881 μs  ┊ GC (mean ± σ):  2.94% ±  8.12%
#            ▁█▁   
#   ▇█▅▅▄▄▃▃▃███▆█▆▅▅▄▄▃▃▄▅▃▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▂▂ ▃
#   801 μs           Histogram: frequency by time         5.23 ms <
#  Memory estimate: 596.98 KiB, allocs estimate: 2914.
# Some time is lost (on PLSE) by checking the curvature condition

@code_warntype update!(eplose, epv_y, epv_s) # full blue
@benchmark update!(eplose, epv_y, epv_s; verbose=false) # Time: 2.033 ms, Memory estimate: 600.20 KiB, allocs estimate: 5316..
p = ProfileSVG.@profview (@benchmark update!(eplose, epv_y, epv_s; verbose=false))
ProfileSVG.save(path*"update_eplose.svg")
# BenchmarkTools.Trial: 2253 samples with 1 evaluation.
#  Range (min … max):  1.798 ms …   7.086 ms  ┊ GC (min … max): 0.00% … 51.80%
#  Time  (median):     1.999 ms               ┊ GC (median):    0.00%
#  Time  (mean ± σ):   2.212 ms ± 667.723 μs  ┊ GC (mean ± σ):  1.61% ±  6.08%
#   █▇▆▆▆▅▄▄▃▃▂▃▂▂▂▁▂▁                                          ▁
#   ███████████████████▇▄▆▆▅▆▄▄▄▅▁▆▄▄▅▄▁▄▁▄▅▄▁▁▁▄▄▆▆▆▆▆▆▆▇▇▆▅▆▅ █
#   1.8 ms       Histogram: log(frequency) by time      5.18 ms <
#  Memory estimate: 591.34 KiB, allocs estimate: 5292.
# if isa(Bi, LBFGSOperator{T}) in PartitionedLOQuasiNewton.jl l.277 takes 1/5 of the time.

epv_from_epv epv_from_v get_build_v minus_epv mul_eplose_epv mul_eplosr1_epv! mul_eplosr1_epv mul_epm_epv! mul_epm_epv add_epv