noahrhodes commented 2 years ago

add type annotations, preallocated arrays, and broadcasted operators to speed up the kamada kawaii layout.

~~Laptop Results:~~ Not up to date

pglib_opf_case14_ieee.m
  0.008143 seconds (7.94 k allocations: 2.549 MiB)
  0.009694 seconds (6.59 k allocations: 1.228 MiB)
Same Layout: true

pglib_opf_case118_ieee.m
  0.536786 seconds (102.92 k allocations: 286.639 MiB, 6.30% gc time)
  0.433696 seconds (86.58 k allocations: 109.460 MiB)
Same Layout: true

pglib_opf_case500_tamu.m
  6.337915 seconds (688.35 k allocations: 3.375 GiB, 5.11% gc time)
  5.621654 seconds (633.22 k allocations: 1.264 GiB, 2.22% gc time)
Same Layout: true

pglib_opf_case1354_pegase.m
 82.930171 seconds (7.38 M allocations: 38.229 GiB, 4.45% gc time)
 75.264216 seconds (6.93 M allocations: 14.091 GiB, 2.39% gc time)
Same Layout: true

Desktop Results:

pglib_opf_case14_ieee.m
  0.004240 seconds (13.11 k allocations: 2.793 MiB)
  0.002504 seconds (4.71 k allocations: 725.516 KiB)
Same Layout: true

pglib_opf_case118_ieee.m
  0.110263 seconds (139.29 k allocations: 287.501 MiB, 21.59% gc time)
  0.047131 seconds (82.93 k allocations: 49.916 MiB, 7.69% gc time)
Same Layout: true

pglib_opf_case500_goc.m
  3.868789 seconds (1.42 M allocations: 5.793 GiB, 11.89% gc time)
  2.090159 seconds (1.09 M allocations: 958.229 MiB, 12.41% gc time)
Same Layout: true

pglib_opf_case1354_pegase.m
 28.027997 seconds (8.45 M allocations: 38.245 GiB, 9.73% gc time)
 13.690771 seconds (6.92 M allocations: 5.991 GiB, 4.54% gc time)
Same Layout: true

~~Changing from OMEinsum to Tullio allows preallocate the gradient calculation array. Allocations now decreased by ~5x. Also has a speed up of of 2x. ~~ ~~case500 is a bizzare outlier. I cannot understand why it runs slower.~~

Removed Tullio, just wrote nested for loops with LoopVectorization. Increase speedup by an additional 2x.

~~First time to run is much improved, about 7x~~

pglib_opf_case14_ieee.m
  3.446269 seconds (12.08 M allocations: 712.245 MiB, 5.07% gc time, 99.84% compilation time)
  0.579834 seconds (2.19 M allocations: 117.468 MiB, 99.20% compilation time)

~~First time to run is less improved, 2x speedup instead of 7x.~~

  3.418529 seconds (12.30 M allocations: 725.067 MiB, 5.03% gc time, 99.84% compilation time)
  1.425368 seconds (4.46 M allocations: 242.790 MiB, 1.78% gc time, 99.78% compilation time)

First time to run is now even worse:

pglib_opf_case14_ieee.m
  3.544652 seconds (12.72 M allocations: 754.263 MiB, 4.51% gc time, 99.84% compilation time)
  9.456756 seconds (17.89 M allocations: 1.000 GiB, 2.26% gc time, 99.96% compilation time)

codecov-commenter commented 2 years ago

Codecov Report

Merging #74 (f8f0330) into master (ca1b463) will decrease coverage by 19.13%. The diff coverage is 0.00%.

@@             Coverage Diff             @@
##           master      #74       +/-   ##
===========================================
- Coverage   92.35%   73.21%   -19.14%     
===========================================
  Files          11       10        -1     
  Lines         340      407       +67     
===========================================
- Hits          314      298       -16     
- Misses         26      109       +83

Impacted Files	Coverage Δ
src/layouts/layout_engines.jl	`38.68% <0.00%> (-55.96%)`	:arrow_down:
src/core/configuration.jl	`77.77% <0.00%> (-11.12%)`	:arrow_down:
src/core/utils.jl	`88.46% <0.00%> (-5.48%)`	:arrow_down:
src/core/export.jl

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update ca1b463...f8f0330. Read the comment docs.

noahrhodes commented 2 years ago

Best option to to correctly pre-allocate arrays and @simd matrix operations.

Using advanced options like LoopVectorization or Tullio are faster but introduce a significant compilation latency. This could be explored in the future if the functions can be precompiled.

WISPO-POP / PowerPlots.jl

Faster KamadaKawaii gradient #74

Codecov Report