@dpo, the following script profile the PartitionedStructures.jl methods used by PartiallySeparableNLPModels.jl, PartitallySeparableSovers.jl and PartitionedKnetNLPModels.jl.
Summary:
a lot of time is spend in set_eev! and set_vec!, which set the element vector, I have to see if I can improve it;
we should implement what you said previously about element matrix, which should include additionnal vectors for BFGS and SR1 updates (at least for the tests like curvature conditions) to avoid allocation.
Remarks: The partitioned matrices using linear operators are much faster for update! and partitioned matrix-vector product, because the push! and mul! of LinearOperators are more efficient.
The script print the benchmark launched on every mehods:
using PartitionedStructures
using BenchmarkTools
using StatsBase
using ProfileSVG
path = pwd()*"/dvpt/profilage/profiles/"
N = 500
n = 2000
range_ni = 1:50
element_variables = map( i-> sample(1:n, rand(range_ni), replace = false), 1:N)
epv = create_epv(element_variables)
@code_warntype create_epv(element_variables; T=Float32) # all blue, when type is on
@code_warntype get_v(build_v!(epv)) #type stable
@benchmark get_v(build_v!(epv))
p = ProfileSVG.@profview (@benchmark get_v(build_v!(epv)))
ProfileSVG.save(path*"get_build_v.svg")
# BenchmarkTools.Trial: 10000 samples with 1 evaluation.
# Range (min … max): 18.100 μs … 439.100 μs ┊ GC (min … max): 0.00% … 0.00%
# Time (median): 20.100 μs ┊ GC (median): 0.00%
# Time (mean ± σ): 23.691 μs ± 11.375 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
# ▅▄█▆▄▃▂▂▁▁▁▂▃▁▁▁▁▁ ▁ ▁
# ████████████████████████▇▇▇▇▇█▆▆▇█▇▇▆▆▇▇▇▇█▇▆▇█▄▇▆▅▅▆▅▅▅▅▅▅▅ █
# 18.1 μs Histogram: log(frequency) by time 58.2 μs <
# Memory estimate: 0 bytes, allocs estimate: 0.
v = rand(n)
@code_warntype epv_from_v!(epv, v) # all blue, one yellow
@benchmark epv_from_v!(epv, v)
p = ProfileSVG.@profview (@benchmark epv_from_v!(epv, v))
ProfileSVG.save(path*"epv_from_v.svg")
# BenchmarkTools.Trial: 10000 samples with 1 evaluation.
# Range (min … max): 40.200 μs … 3.124 ms ┊ GC (min … max): 0.00% … 94.13%
# Time (median): 45.600 μs ┊ GC (median): 0.00%
# Time (mean ± σ): 57.456 μs ± 89.910 μs ┊ GC (mean ± σ): 7.59% ± 4.88%
# ▄▇█▇▅▄▃▃▂▃▄▄▃▃▂▂▁▁▁▁▂▂▁ ▁ ▂
# ███████████████████████████▇▇▆▇▇▇▆▆▆▇▅▄▅▆▇▅▆▅▆▆▆▆▇▅▆▆▅▅▄▄▅▅ █
# 40.2 μs Histogram: log(frequency) by time 140 μs <
# Memory estimate: 125.92 KiB, allocs estimate: 500.
# we must look closely to `ModElemental_pv.set_eev!` and `ModElemental_pv.set_vec!` to compare allocations with `.=` or with `=`
# I think `x[get_indices(get_eev_set(epv_x, idx))]` allocate an temporary array.
epv2 = similar(epv)
@code_warntype epv_from_epv!(epv2, epv) # not fully blue because of `full_check_epv_epm`
@benchmark epv_from_epv!(epv2, epv)
p = ProfileSVG.@profview (@benchmark epv_from_epv!(epv, epv2))
ProfileSVG.save(path*"epv_from_epv.svg")
# `full_check_epv_epm` take the majoriry of the time (at least 2/3)
# BenchmarkTools.Trial: 10000 samples with 1 evaluation.
# Range (min … max): 31.000 μs … 289.300 μs ┊ GC (min … max): 0.00% … 0.00%
# Time (median): 35.800 μs ┊ GC (median): 0.00%
# Time (mean ± σ): 46.126 μs ± 21.648 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
# █▇▆▆▅▄▂▂▁▂▃▃▄▃▃▂ ▂▃▂▂▂▁▂▁ ▁▁▁▁▂▁ ▁▂ ▁ ▁ ▂
# █████████████████████████████████████████▇▇▇▇█▆▇▇▆▆▆▄▆▆▅▆▄▄▅ █
# 31 μs Histogram: log(frequency) by time 122 μs <
# Memory estimate: 0 bytes, allocs estimate: 0.
@code_warntype minus_epv!(epv2) # all blue
@benchmark minus_epv!(epv2)
p = ProfileSVG.@profview (@benchmark minus_epv!(epv2))
ProfileSVG.save(path*"minus_epv.svg")
# BenchmarkTools.Trial: 10000 samples with 1 evaluation.
# Range (min … max): 29.100 μs … 1.946 ms ┊ GC (min … max): 0.00% … 94.85%
# Time (median): 32.100 μs ┊ GC (median): 0.00%
# Time (mean ± σ): 39.656 μs ± 86.093 μs ┊ GC (mean ± σ): 11.32% ± 5.12%
# ▁▅▇█▇▇▅▄▄▃▃▂▂▂▂▁ ▁▁▁▁ ▁▁▁▁▁▁ ▂
# ██████████████████████████████▇▇▇▆▇▇▆▆▇▇▆▇▇▇██▇▇█▇▇▇█▇▆▆▆▆▄ █
# 29.1 μs Histogram: log(frequency) by time 67.9 μs <
# Memory estimate: 129.98 KiB, allocs estimate: 501.
# Maybe improve
# @inline set_minus_vec!(ev::T) where {T <: Elt_vec} = set_vec!(ev, -get_vec(ev))
@code_warntype add_epv!(epv, epv2) # not fully blue because of `full_check_epv_epm`
@benchmark add_epv!(epv, epv2) # Time 26.800 μs, Memory estimate: 0 bytes, allocs estimate: 0.
p = ProfileSVG.@profview (@benchmark add_epv!(epv, epv2))
ProfileSVG.save(path*"add_epv.svg")
# `full_check_epv_epm` take the majoriry of the time (at least 2/3)
# BenchmarkTools.Trial: 10000 samples with 1 evaluation.
# Range (min … max): 32.600 μs … 128.100 μs ┊ GC (min … max): 0.00% … 0.00%
# Time (median): 33.200 μs ┊ GC (median): 0.00%
# Time (mean ± σ): 35.751 μs ± 8.122 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
# █▄▇ ▄ ▄ ▁
# ███████▇█▇▇▇▅▆▇▄▃▃▅▅▄▄▆▄▅▅▆▅▅▄▃▇▇▆▆▅▄▅▂▄▃▃▃▄█▇█▆▇▆▇▆▄▂▄▃▄▂▃▃ █
# 32.6 μs Histogram: log(frequency) by time 70.7 μs <
# Memory estimate: 0 bytes, allocs estimate: 0.
epm = epm_from_epv(epv)
eplosr1 = eplo_lsr1_from_epv(epv)
eplose = eplo_lose_from_epv(epv)
epv_y = similar(epv)
epv_s = similar(epv)
epv_from_v!(epv_y, rand(n) .- 0.5)
epv_from_v!(epv_s, rand(n) .- 0.5)
# one update to get element matrices different than identity
update!(epm, epv_y, epv_s; name=:pse)
update!(eplosr1, epv_y, epv_s)
update!(eplose, epv_y, epv_s)
# Elemental partitioned matrix
@code_warntype mul_epm_epv(epm, epv) # full blue
@benchmark mul_epm_epv(epm, epv)
p = ProfileSVG.@profview (@benchmark mul_epm_epv(epm, epv))
ProfileSVG.save(path*"mul_epm_epv.svg")
# BenchmarkTools.Trial: 386 samples with 1 evaluation.
# Range (min … max): 11.640 ms … 20.748 ms ┊ GC (min … max): 0.00% … 0.00%
# Time (median): 12.790 ms ┊ GC (median): 0.00%
# Time (mean ± σ): 12.924 ms ± 926.861 μs ┊ GC (mean ± σ): 0.61% ± 3.29%
# ▁▄▅▆▇█▆▄▃▁▂
# ▃▃▄▄▅▆███████████▅▄▅▄▂▁▃▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▂▁▁▂ ▃
# 11.6 ms Histogram: frequency by time 17.8 ms <
# Memory estimate: 847.02 KiB, allocs estimate: 6326.
# the calls to `set_eev!` are taking the majority of the time (2nd time involved)
# I think it might be due to `Bie * vie` inside `set_eev!`.
epv_res = similar(epv)
@code_warntype PartitionedStructures.mul_epm_epv!(epv_res, epm, epv) # full blue
@benchmark PartitionedStructures.mul_epm_epv!(epv_res, epm, epv)
p = ProfileSVG.@profview (@benchmark PartitionedStructures.mul_epm_epv!(epv_res, epm, epv))
ProfileSVG.save(path*"mul_epm_epv!.svg")
# BenchmarkTools.Trial: 389 samples with 1 evaluation.
# Range (min … max): 10.918 ms … 38.102 ms ┊ GC (min … max): 0.00% … 0.00%
# Time (median): 12.554 ms ┊ GC (median): 0.00%
# Time (mean ± σ): 12.832 ms ± 1.704 ms ┊ GC (mean ± σ): 0.04% ± 0.69%
# ▂ ▅ █▃▇▅▆▂▂▄▁
# ▂▁▃▃▂▃▄▄▅▇▇█▅█▅█████████▆▆▅▇▃▆▇▄▄▁▄▃▂▃▃▃▃▂▂▃▂▁▂▃▂▂▃▁▁▃▃▁▂▁▃ ▃
# 10.9 ms Histogram: frequency by time 16 ms <
# Memory estimate: 125.92 KiB, allocs estimate: 500.
# PLSR1 operator
@code_warntype mul_epm_epv(eplosr1, epv) # full blue
@benchmark mul_epm_epv(eplosr1, epv)
p = ProfileSVG.@profview (@benchmark mul_epm_epv(eplosr1, epv))
ProfileSVG.save(path*"mul_eplosr1_epv.svg")
# BenchmarkTools.Trial: 7689 samples with 1 evaluation.
# Range (min … max): 421.300 μs … 9.081 ms ┊ GC (min … max): 0.00% … 82.23%
# Time (median): 482.300 μs ┊ GC (median): 0.00%
# Time (mean ± σ): 645.372 μs ± 657.664 μs ┊ GC (mean ± σ): 12.57% ± 11.56%
# █▆▅▄▃▁ ▂▂▁ ▁
# ██████████████▇▇▆▆▆▅▅▃▃▄▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▄▅▆▆▆▆▅▆ █
# 421 μs Histogram: log(frequency) by time 4.81 ms <
# Memory estimate: 847.02 KiB, allocs estimate: 6326.
# the product between linear-operator vector is faster than matrix-vector
# most of the time is takn by `similar(epv)` hidden in `mul_epm_epv`
@code_warntype PartitionedStructures.mul_epm_epv!(epv_res, eplosr1, epv) # Bie::LinearOperators.LSR1Operator{Float64} en rouge
@benchmark PartitionedStructures.mul_epm_epv!(epv_res, eplosr1, epv)
p = ProfileSVG.@profview (@benchmark PartitionedStructures.mul_epm_epv!(epv_res, eplosr1, epv))
ProfileSVG.save(path*"mul_eplosr1_epv!.svg")
# BenchmarkTools.Trial: 10000 samples with 1 evaluation.
# Range (min … max): 121.200 μs … 30.156 ms ┊ GC (min … max): 0.00% … 0.00%
# Time (median): 204.650 μs ┊ GC (median): 0.00%
# Time (mean ± σ): 271.456 μs ± 369.850 μs ┊ GC (mean ± σ): 2.68% ± 4.57%
# █
# █▅▅▄▄▃▃▃▃▃▂▂▂▂▂▂▂▄▂▃▃▃▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
# 121 μs Histogram: frequency by time 887 μs <
# Memory estimate: 125.92 KiB, allocs estimate: 500.
# `full_check_epv_epm` takes 1/4 of the time, the rest is `mul` applied on elements.
# PLSE operator
@code_warntype mul_epm_epv(eplose, epv) # full blue
@benchmark mul_epm_epv(eplose, epv)
p = ProfileSVG.@profview (@benchmark mul_epm_epv(eplose, epv))
ProfileSVG.save(path*"mul_eplose_epv.svg")
# most of the time is takn by `similar(epv)` hidden in `mul_epm_epv`
# BenchmarkTools.Trial: 6189 samples with 1 evaluation.
# Range (min … max): 452.900 μs … 12.158 ms ┊ GC (min … max): 0.00% … 82.97%
# Time (median): 533.200 μs ┊ GC (median): 0.00%
# Time (mean ± σ): 800.565 μs ± 804.668 μs ┊ GC (mean ± σ): 11.06% ± 11.36%
# █▆▅▄▃▃▂▁▁▃▃▂▁▁▁ ▁▁ ▁
# ████████████████████▇▇▆▆▆▆▅▅▅▅▄▁▁▅▁▄▁▄▁▃▃▃▁▃▁▁▁▁▁▁▁▃▃▄▆▅▄▇▆▆▆ █
# 453 μs Histogram: log(frequency) by time 5 ms <
# Memory estimate: 847.02 KiB, allocs estimate: 6326.
@code_warntype PartitionedStructures.mul_epm_epv!(epv_res, eplose, epv) # Bie::Any, Bie*vie en rouge
@benchmark PartitionedStructures.mul_epm_epv!(epv_res, eplose, epv)
p = ProfileSVG.@profview (@benchmark PartitionedStructures.mul_epm_epv!(epv_res, eplose, epv))
ProfileSVG.save(path*"mul_eplose_epv!.svg")
# BenchmarkTools.Trial: 10000 samples with 1 evaluation.
# Range (min … max): 147.900 μs … 4.393 ms ┊ GC (min … max): 0.00% … 0.00%
# Time (median): 162.000 μs ┊ GC (median): 0.00%
# Time (mean ± σ): 228.296 μs ± 182.990 μs ┊ GC (mean ± σ): 2.60% ± 4.48%
# █▆▄▄▄▃▃▂▂▂▂▁▁▁▁▁▁ ▁▁▂▁ ▁
# █████████████████████▇███████████▇▇▇█▇▇▆▆▄▅▆▅▅▅▃▅▄▅▅▅▅▆▅▅▅▅▅▅ █
# 148 μs Histogram: log(frequency) by time 765 μs <
# Memory estimate: 125.92 KiB, allocs estimate: 500.
# `full_check_epv_epm` takes 1/4 of the time, the rest is `mul` applied on elements.
# Updates
@code_warntype update!(epm, epv_y, epv_s; name=:pse) # full blue
@benchmark update!(epm, epv_y, epv_s; name=:pse, verbose=false)
p = ProfileSVG.@profview (@benchmark update!(epm, epv_y, epv_s; name=:pse, verbose=false))
ProfileSVG.save(path*"update_epm.svg")
# BenchmarkTools.Trial: 1788 samples with 1 evaluation.
# Range (min … max): 1.805 ms … 10.378 ms ┊ GC (min … max): 0.00% … 36.20%
# Time (median): 2.172 ms ┊ GC (median): 0.00%
# Time (mean ± σ): 2.779 ms ± 1.204 ms ┊ GC (mean ± σ): 13.42% ± 16.44%
# █▅▁
# ████▆▅▄▄▃▃▃▃▂▂▂▃▂▃▃▃▃▄▄▃▃▃▃▃▃▃▂▃▂▂▂▂▂▂▁▂▂▁▂▂▂▂▂▂▂▂▂▂▁▁▂▁▂▂ ▃
# 1.8 ms Histogram: frequency by time 7.58 ms <
# Memory estimate: 8.73 MiB, allocs estimate: 2143.
# most of the computation time is about the `(y * y') ./ dot(y, s)`, `(Bs * Bs') ./ dot(Bs, s)` and `r = y .- B * s`, `B_1 .= B .+ terme1 .- terme2` and `B_1 .= B .+ ((r * r') ./ dot(s, r))`.
# These 5 lines take 85% of the `update!` time.
@code_warntype update!(eplosr1, epv_y, epv_s) # full blue
@benchmark update!(eplosr1, epv_y, epv_s; verbose=false)
p = ProfileSVG.@profview (@benchmark update!(eplosr1, epv_y, epv_s; verbose=false))
ProfileSVG.save(path*"update_eplosr1.svg")
# BenchmarkTools.Trial: 2697 samples with 1 evaluation.
# Range (min … max): 800.700 μs … 11.132 ms ┊ GC (min … max): 0.00% … 40.26%
# Time (median): 1.633 ms ┊ GC (median): 0.00%
# Time (mean ± σ): 1.843 ms ± 890.881 μs ┊ GC (mean ± σ): 2.94% ± 8.12%
# ▁█▁
# ▇█▅▅▄▄▃▃▃███▆█▆▅▅▄▄▃▃▄▅▃▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▂▂ ▃
# 801 μs Histogram: frequency by time 5.23 ms <
# Memory estimate: 596.98 KiB, allocs estimate: 2914.
# Some time is lost (on PLSE) by checking the curvature condition
@code_warntype update!(eplose, epv_y, epv_s) # full blue
@benchmark update!(eplose, epv_y, epv_s; verbose=false) # Time: 2.033 ms, Memory estimate: 600.20 KiB, allocs estimate: 5316..
p = ProfileSVG.@profview (@benchmark update!(eplose, epv_y, epv_s; verbose=false))
ProfileSVG.save(path*"update_eplose.svg")
# BenchmarkTools.Trial: 2253 samples with 1 evaluation.
# Range (min … max): 1.798 ms … 7.086 ms ┊ GC (min … max): 0.00% … 51.80%
# Time (median): 1.999 ms ┊ GC (median): 0.00%
# Time (mean ± σ): 2.212 ms ± 667.723 μs ┊ GC (mean ± σ): 1.61% ± 6.08%
# █▇▆▆▆▅▄▄▃▃▂▃▂▂▂▁▂▁ ▁
# ███████████████████▇▄▆▆▅▆▄▄▄▅▁▆▄▄▅▄▁▄▁▄▅▄▁▁▁▄▄▆▆▆▆▆▆▆▇▇▆▅▆▅ █
# 1.8 ms Histogram: log(frequency) by time 5.18 ms <
# Memory estimate: 591.34 KiB, allocs estimate: 5292.
# if isa(Bi, LBFGSOperator{T}) in PartitionedLOQuasiNewton.jl l.277 takes 1/5 of the time.
@dpo, the following script profile the PartitionedStructures.jl methods used by PartiallySeparableNLPModels.jl, PartitallySeparableSovers.jl and PartitionedKnetNLPModels.jl.
Summary:
set_eev
! andset_vec!
, which set the element vector, I have to see if I can improve it;See PartiallySeparableSolvers.jl method profiles to see PartitionedStructures.jl methods impact in a partitioned solver.
Remarks: The partitioned matrices using linear operators are much faster for
update!
and partitioned matrix-vector product, because thepush!
andmul!
of LinearOperators are more efficient.The script print the benchmark launched on every mehods: