WISPO-POP / PowerPlots.jl

Functions plot PowerModels networks
BSD 3-Clause "New" or "Revised" License
21 stars 2 forks source link

Faster KamadaKawaii gradient #74

Closed noahrhodes closed 2 years ago

noahrhodes commented 2 years ago

add type annotations, preallocated arrays, and broadcasted operators to speed up the kamada kawaii layout.

Laptop Results: Not up to date

pglib_opf_case14_ieee.m
  0.008143 seconds (7.94 k allocations: 2.549 MiB)
  0.009694 seconds (6.59 k allocations: 1.228 MiB)
Same Layout: true

pglib_opf_case118_ieee.m
  0.536786 seconds (102.92 k allocations: 286.639 MiB, 6.30% gc time)
  0.433696 seconds (86.58 k allocations: 109.460 MiB)
Same Layout: true

pglib_opf_case500_tamu.m
  6.337915 seconds (688.35 k allocations: 3.375 GiB, 5.11% gc time)
  5.621654 seconds (633.22 k allocations: 1.264 GiB, 2.22% gc time)
Same Layout: true

pglib_opf_case1354_pegase.m
 82.930171 seconds (7.38 M allocations: 38.229 GiB, 4.45% gc time)
 75.264216 seconds (6.93 M allocations: 14.091 GiB, 2.39% gc time)
Same Layout: true

Desktop Results:

pglib_opf_case14_ieee.m
  0.004240 seconds (13.11 k allocations: 2.793 MiB)
  0.002504 seconds (4.71 k allocations: 725.516 KiB)
Same Layout: true

pglib_opf_case118_ieee.m
  0.110263 seconds (139.29 k allocations: 287.501 MiB, 21.59% gc time)
  0.047131 seconds (82.93 k allocations: 49.916 MiB, 7.69% gc time)
Same Layout: true

pglib_opf_case500_goc.m
  3.868789 seconds (1.42 M allocations: 5.793 GiB, 11.89% gc time)
  2.090159 seconds (1.09 M allocations: 958.229 MiB, 12.41% gc time)
Same Layout: true

pglib_opf_case1354_pegase.m
 28.027997 seconds (8.45 M allocations: 38.245 GiB, 9.73% gc time)
 13.690771 seconds (6.92 M allocations: 5.991 GiB, 4.54% gc time)
Same Layout: true

~~Changing from OMEinsum to Tullio allows preallocate the gradient calculation array. Allocations now decreased by ~5x. Also has a speed up of of 2x. ~~ case500 is a bizzare outlier. I cannot understand why it runs slower.

Removed Tullio, just wrote nested for loops with LoopVectorization. Increase speedup by an additional 2x.

First time to run is much improved, about 7x

pglib_opf_case14_ieee.m
  3.446269 seconds (12.08 M allocations: 712.245 MiB, 5.07% gc time, 99.84% compilation time)
  0.579834 seconds (2.19 M allocations: 117.468 MiB, 99.20% compilation time)

First time to run is less improved, 2x speedup instead of 7x.

  3.418529 seconds (12.30 M allocations: 725.067 MiB, 5.03% gc time, 99.84% compilation time)
  1.425368 seconds (4.46 M allocations: 242.790 MiB, 1.78% gc time, 99.78% compilation time)

First time to run is now even worse:

pglib_opf_case14_ieee.m
  3.544652 seconds (12.72 M allocations: 754.263 MiB, 4.51% gc time, 99.84% compilation time)
  9.456756 seconds (17.89 M allocations: 1.000 GiB, 2.26% gc time, 99.96% compilation time)
codecov-commenter commented 2 years ago

Codecov Report

Merging #74 (f8f0330) into master (ca1b463) will decrease coverage by 19.13%. The diff coverage is 0.00%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master      #74       +/-   ##
===========================================
- Coverage   92.35%   73.21%   -19.14%     
===========================================
  Files          11       10        -1     
  Lines         340      407       +67     
===========================================
- Hits          314      298       -16     
- Misses         26      109       +83     
Impacted Files Coverage Δ
src/layouts/layout_engines.jl 38.68% <0.00%> (-55.96%) :arrow_down:
src/core/configuration.jl 77.77% <0.00%> (-11.12%) :arrow_down:
src/core/utils.jl 88.46% <0.00%> (-5.48%) :arrow_down:
src/core/export.jl

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update ca1b463...f8f0330. Read the comment docs.

noahrhodes commented 2 years ago

Best option to to correctly pre-allocate arrays and @simd matrix operations.

Using advanced options like LoopVectorization or Tullio are faster but introduce a significant compilation latency. This could be explored in the future if the functions can be precompiled.