Closed dpo closed 2 years ago
Merging #41 (29bd2f3) into main (de00a84) will increase coverage by
1.27%
. The diff coverage is98.00%
.
@@ Coverage Diff @@
## main #41 +/- ##
==========================================
+ Coverage 88.33% 89.61% +1.27%
==========================================
Files 5 5
Lines 300 308 +8
==========================================
+ Hits 265 276 +11
+ Misses 35 32 -3
Impacted Files | Coverage Δ | |
---|---|---|
src/BundleAdjustmentNLSFunctions.jl | 91.59% <98.00%> (+3.30%) |
:arrow_up: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update de00a84...29bd2f3. Read the comment docs.
julia> @benchmark residual!($nls, $(nls.meta.x0), $r)
BenchmarkTools.Trial: 143 samples with 1 evaluation.
Range (min … max): 28.426 ms … 66.456 ms ┊ GC (min … max): 0.00% … 16.05%
Time (median): 36.903 ms ┊ GC (median): 19.57%
Time (mean ± σ): 35.035 ms ± 4.840 ms ┊ GC (mean ± σ): 11.30% ± 9.94%
▄█▄▅▂ ▂█ ▂▄▂▂
▅▅▅▁▅▃▇█████▇▅▅▃▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▅▆▆▅▇██▇████▇▅▅▆▁▃▁▃▅▁▁▁▁▃ ▃
28.4 ms Histogram: frequency by time 42.5 ms <
Memory estimate: 27.36 MiB, allocs estimate: 933236.
julia> @benchmark residual!($nls, $(nls.meta.x0), $r)
BenchmarkTools.Trial: 517 samples with 1 evaluation.
Range (min … max): 7.942 ms … 24.299 ms ┊ GC (min … max): 0.00% … 19.46%
Time (median): 9.010 ms ┊ GC (median): 0.00%
Time (mean ± σ): 9.676 ms ± 1.912 ms ┊ GC (mean ± σ): 5.17% ± 9.69%
▁▁▅█▇▁▁ ▇▂
▅██████████▄▄▃▂▃▃▁▃▂▂▁▂▃▁▂▂▃▃▃▄▄▄▄▄▃▃▁▃▂▃▂▂▂▁▂▁▁▂▁▂▁▂▁▂▁▁▂ ▃
7.94 ms Histogram: frequency by time 16.3 ms <
Memory estimate: 7.29 MiB, allocs estimate: 95531.
Something is going on with the way threading is used in here. It allocates tons of memory.
For comparison, here's the benchmark of the version with threading and 4 threads:
julia> @benchmark residual!($nls, $(nls.meta.x0), $r)
BenchmarkTools.Trial: 412 samples with 1 evaluation.
Range (min … max): 7.387 ms … 50.271 ms ┊ GC (min … max): 0.00% … 77.75%
Time (median): 8.254 ms ┊ GC (median): 0.00%
Time (mean ± σ): 12.126 ms ± 9.268 ms ┊ GC (mean ± σ): 29.92% ± 26.08%
▅█▅▁ ▁
████▇▇▄█▇▄▁▁▄▁▄▁▁▄▁▄▄▁▁▁▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▇███▅▅▅▆ ▆
7.39 ms Histogram: log(frequency) by time 37.4 ms <
Memory estimate: 27.34 MiB, allocs estimate: 931745.
The results are the same on my Mac with 8 threads, but that may be because I have 4 cores, each with hyperthreading, and Julia isn't taking advantage of that (not sure).
Success at last!
julia> @benchmark residual!($model, $(model.meta.x0), $r)
BenchmarkTools.Trial: 921 samples with 1 evaluation.
Range (min … max): 4.857 ms … 10.066 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 5.321 ms ┊ GC (median): 0.00%
Time (mean ± σ): 5.413 ms ± 539.166 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▁▆▅█▇▃▃▃▆▃▂
▆███████████▇▆▄▅▄▄▃▂▂▁▃▂▁▂▁▂▁▂▁▁▁▁▁▁▂▁▁▁▁▁▂▁▁▂▁▁▁▂▁▁▁▁▁▁▁▁▂ ▃
4.86 ms Histogram: frequency by time 8.77 ms <
Memory estimate: 0 bytes, allocs estimate: 0.
Needs https://github.com/JuliaSmoothOptimizers/NLPModels.jl/pull/393 (otherwise, there remain 2 bytes of allocations!)
Non-allocating jac_structure!
(requires NLPModels 0.18.2):
julia> @benchmark jac_structure!($model, $rows, $cols)
BenchmarkTools.Trial: 2150 samples with 1 evaluation.
Range (min … max): 2.040 ms … 4.936 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 2.228 ms ┊ GC (median): 0.00%
Time (mean ± σ): 2.321 ms ± 313.718 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
█▁
▅▆▅██▆▆▅▅▆▆▅▃▄▃▄▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▁▂▂▂▂▁▂▂▁▂▂▂▂▁▁▂▁▁▂▂▂▂ ▃
2.04 ms Histogram: frequency by time 4.04 ms <
Memory estimate: 0 bytes, allocs estimate: 0.
This PR's objective is to decrease the amount of allocations when evaluating the residual. On problem-49-7776-pre/ladybug, I get
Before
After
There are still lots of allocations that don't seem necessary.
Any ideas?
cc @AntoninKns