JuliaSmoothOptimizers / BundleAdjustmentModels.jl

Julia repository of bundle adjustment problems
Mozilla Public License 2.0
9 stars 5 forks source link

decrease allocations in residual evaluation #41

Closed dpo closed 2 years ago

dpo commented 2 years ago

This PR's objective is to decrease the amount of allocations when evaluating the residual. On problem-49-7776-pre/ladybug, I get

Before

julia> @benchmark residual!($nls, $(nls.meta.x0), $r)
BenchmarkTools.Trial: 109 samples with 1 evaluation.
 Range (min … max):  35.592 ms … 98.198 ms  ┊ GC (min … max):  0.00% … 7.13%
 Time  (median):     43.779 ms              ┊ GC (median):    13.09%
 Time  (mean ± σ):   46.145 ms ±  9.294 ms  ┊ GC (mean ± σ):  12.04% ± 3.36%

       ▃▃▃██▅▆
  ▅▅▁▁▇███████▇█▁▅▅▅▅▅▁▁▁▅▁▅▁▁▁▅▁▁▁▁▁▁▁▁▁▁▅▁▁▁▁▁▁▁▁▁▅▁▁▁▁▁▅▁▅ ▅
  35.6 ms      Histogram: log(frequency) by time      87.3 ms <

 Memory estimate: 46.31 MiB, allocs estimate: 1156139.

After

 julia> @benchmark residual!($nls, $(nls.meta.x0), $r)
BenchmarkTools.Trial: 129 samples with 1 evaluation.
 Range (min … max):  31.432 ms … 67.156 ms  ┊ GC (min … max):  0.00% … 0.00%
 Time  (median):     39.505 ms              ┊ GC (median):    16.44%
 Time  (mean ± σ):   38.756 ms ±  4.884 ms  ┊ GC (mean ± σ):  12.02% ± 7.62%

    ▂               ▅▆▄█▁
  ▅▅█▆▆▆▄▄▃▅▁▁▁▁▃▅▅▅█████▆▅▄▃▃▃▃▄▁▁▁▁▁▁▁▁▁▁▁▁▃▁▁▃▃▁▁▁▁▁▁▁▁▁▁▃ ▃
  31.4 ms         Histogram: frequency by time        56.2 ms <

 Memory estimate: 34.65 MiB, allocs estimate: 1028765.

There are still lots of allocations that don't seem necessary.

Any ideas?

cc @AntoninKns

codecov[bot] commented 2 years ago

Codecov Report

Merging #41 (29bd2f3) into main (de00a84) will increase coverage by 1.27%. The diff coverage is 98.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main      #41      +/-   ##
==========================================
+ Coverage   88.33%   89.61%   +1.27%     
==========================================
  Files           5        5              
  Lines         300      308       +8     
==========================================
+ Hits          265      276      +11     
+ Misses         35       32       -3     
Impacted Files Coverage Δ
src/BundleAdjustmentNLSFunctions.jl 91.59% <98.00%> (+3.30%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update de00a84...29bd2f3. Read the comment docs.

dpo commented 2 years ago
julia> @benchmark residual!($nls, $(nls.meta.x0), $r)
BenchmarkTools.Trial: 143 samples with 1 evaluation.
 Range (min … max):  28.426 ms … 66.456 ms  ┊ GC (min … max):  0.00% … 16.05%
 Time  (median):     36.903 ms              ┊ GC (median):    19.57%
 Time  (mean ± σ):   35.035 ms ±  4.840 ms  ┊ GC (mean ± σ):  11.30% ±  9.94%

         ▄█▄▅▂                          ▂█ ▂▄▂▂                
  ▅▅▅▁▅▃▇█████▇▅▅▃▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▅▆▆▅▇██▇████▇▅▅▆▁▃▁▃▅▁▁▁▁▃ ▃
  28.4 ms         Histogram: frequency by time        42.5 ms <

 Memory estimate: 27.36 MiB, allocs estimate: 933236.
dpo commented 2 years ago
julia> @benchmark residual!($nls, $(nls.meta.x0), $r)
BenchmarkTools.Trial: 517 samples with 1 evaluation.
 Range (min … max):  7.942 ms … 24.299 ms  ┊ GC (min … max): 0.00% … 19.46%
 Time  (median):     9.010 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   9.676 ms ±  1.912 ms  ┊ GC (mean ± σ):  5.17% ±  9.69%

   ▁▁▅█▇▁▁ ▇▂                                                 
  ▅██████████▄▄▃▂▃▃▁▃▂▂▁▂▃▁▂▂▃▃▃▄▄▄▄▄▃▃▁▃▂▃▂▂▂▁▂▁▁▂▁▂▁▂▁▂▁▁▂ ▃
  7.94 ms        Histogram: frequency by time        16.3 ms <

 Memory estimate: 7.29 MiB, allocs estimate: 95531.
dpo commented 2 years ago

Something is going on with the way threading is used in here. It allocates tons of memory.

dpo commented 2 years ago

For comparison, here's the benchmark of the version with threading and 4 threads:

julia> @benchmark residual!($nls, $(nls.meta.x0), $r)
BenchmarkTools.Trial: 412 samples with 1 evaluation.
 Range (min … max):   7.387 ms … 50.271 ms  ┊ GC (min … max):  0.00% … 77.75%
 Time  (median):      8.254 ms              ┊ GC (median):     0.00%
 Time  (mean ± σ):   12.126 ms ±  9.268 ms  ┊ GC (mean ± σ):  29.92% ± 26.08%

  ▅█▅▁                                                 ▁
  ████▇▇▄█▇▄▁▁▄▁▄▁▁▄▁▄▄▁▁▁▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▇███▅▅▅▆ ▆
  7.39 ms      Histogram: log(frequency) by time      37.4 ms <

 Memory estimate: 27.34 MiB, allocs estimate: 931745.

The results are the same on my Mac with 8 threads, but that may be because I have 4 cores, each with hyperthreading, and Julia isn't taking advantage of that (not sure).

dpo commented 2 years ago

Success at last!

julia> @benchmark residual!($model, $(model.meta.x0), $r)
BenchmarkTools.Trial: 921 samples with 1 evaluation.
 Range (min … max):  4.857 ms …  10.066 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     5.321 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   5.413 ms ± 539.166 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

   ▁▆▅█▇▃▃▃▆▃▂                                                 
  ▆███████████▇▆▄▅▄▄▃▂▂▁▃▂▁▂▁▂▁▂▁▁▁▁▁▁▂▁▁▁▁▁▂▁▁▂▁▁▁▂▁▁▁▁▁▁▁▁▂ ▃
  4.86 ms         Histogram: frequency by time        8.77 ms <

 Memory estimate: 0 bytes, allocs estimate: 0.

Needs https://github.com/JuliaSmoothOptimizers/NLPModels.jl/pull/393 (otherwise, there remain 2 bytes of allocations!)

dpo commented 2 years ago

Non-allocating jac_structure! (requires NLPModels 0.18.2):

julia> @benchmark jac_structure!($model, $rows, $cols)
BenchmarkTools.Trial: 2150 samples with 1 evaluation.
 Range (min … max):  2.040 ms …   4.936 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     2.228 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.321 ms ± 313.718 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

     █▁                                                        
  ▅▆▅██▆▆▅▅▆▆▅▃▄▃▄▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▁▂▂▂▂▁▂▂▁▂▂▂▂▁▁▂▁▁▂▂▂▂ ▃
  2.04 ms         Histogram: frequency by time        4.04 ms <

 Memory estimate: 0 bytes, allocs estimate: 0.