CliMA / Thermodynamics.jl

A package containing a library of moist thermodynamic relations.
https://clima.github.io/Thermodynamics.jl/dev/
Apache License 2.0
61 stars 2 forks source link

Inlining everything #183

Closed charleskawczynski closed 8 months ago

charleskawczynski commented 8 months ago

This PR attempts to systematically inline all thermo functions, to see the impact of performance.

codecov[bot] commented 8 months ago

Codecov Report

Attention: 15 lines in your changes are missing coverage. Please review.

Comparison is base (805d172) 92.91% compared to head (ecfb71d) 92.96%.

Files Patch % Lines
src/config_numerical_method.jl 65.00% 7 Missing :warning:
src/relations.jl 96.57% 6 Missing :warning:
src/states.jl 94.28% 2 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #183 +/- ## ========================================== + Coverage 92.91% 92.96% +0.04% ========================================== Files 10 10 Lines 1144 1152 +8 ========================================== + Hits 1063 1071 +8 Misses 81 81 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

charleskawczynski commented 8 months ago

I took a look at the buoyancy kernel, and I'm seeing a speedup on this branch (on the CPU):

Main branch

******************* thermo_kernel
BenchmarkTools.Trial: 236 samples with 1 evaluation.
 Range (min … max):  20.429 ms …  28.884 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     20.910 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   21.210 ms ± 876.287 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

     ▄▃█▆▁                                                      
  ▃▄███████▅▄▃▄▅▂▃▄▃▁▂▂▁▂▃▃▄▃▄▂▂▃▂▃▃▃▂▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂ ▃
  20.4 ms         Histogram: frequency by time         24.8 ms <

 Memory estimate: 0 bytes, allocs estimate: 0.
******************* buoyancy_gradients_kernel
BenchmarkTools.Trial: 27 samples with 1 evaluation.
 Range (min … max):  180.741 ms … 196.250 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     183.876 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   185.915 ms ±   4.569 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

   ▃    ▃   ██▃                                                  
  ▇█▁▁▇▁█▁▁▁███▇▇▇▇▁▁▇▁▇▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▇▁▇▁▁▇▁▇▁▁▁▁▁▁▁▇▁▁▁▁▁▇▁▇ ▁
  181 ms           Histogram: frequency by time          196 ms <

 Memory estimate: 16 bytes, allocs estimate: 1.

This branch:

******************* thermo_kernel
BenchmarkTools.Trial: 368 samples with 1 evaluation.
 Range (min … max):  13.501 ms …  14.498 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     13.544 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   13.592 ms ± 143.209 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▁██▇▆▅▆▄▃▃▂▁                                                  
  ████████████▇▆▁▁▄▄▁▁▁▆▁▇▄▁▆▁▁▆▄▁▁▁▁▄▄▁▄▁▄▄▁▁▁▄▁▁▁▁▁▁▄▆▆▇▆▄▄▆ ▇
  13.5 ms       Histogram: log(frequency) by time      14.1 ms <

 Memory estimate: 0 bytes, allocs estimate: 0.
******************* buoyancy_gradients_kernel
BenchmarkTools.Trial: 36 samples with 1 evaluation.
 Range (min … max):  138.371 ms … 142.465 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     139.539 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   139.659 ms ± 890.868 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

               ▂   █▂  ▂                                         
  ▅▁▅▁▅▅█▅▅▁▅▅▁█▅▅▅██▁▁█▁▅▅▁▅▅▁▁▁▁▁▁▁▅▁▅▁▁▁▁▁▁▁▁▅▁▁▁▅▁▁▁▁▁▁▁▁▁▅ ▁
  138 ms           Histogram: frequency by time          142 ms <

 Memory estimate: 16 bytes, allocs estimate: 1.

I'm not sure if the input parameters are performing saturation adjustment, but it doesn't matter in the case of the buoyancy gradients kernel, since SA is never called regardless of what the inputs are.

cc @glwagner, @szy21. I'm comfortable moving forward with this.

Interestingly, I didn't realize this until now, it looks like the buoyancy kernel is allocating 🤔

trontrytel commented 8 months ago

This looks great! So maybe it will speed up the quadratures also...

charleskawczynski commented 8 months ago

This looks great! So maybe it will speed up the quadratures also...

Yeah, I was thinking that, too

glwagner commented 8 months ago

Good news.

charleskawczynski commented 8 months ago

Also, from the updated benchmark, which I think may reflect the impact better for the CPU (we won't see differences on the gpu where we use always_inline = true):

main

BenchmarkTools.Trial: 1273 samples with 1 evaluation.
 Range (min … max):  3.875 ms …  4.541 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     3.881 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   3.928 ms ± 93.551 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%
  █▃▁ ▃▃▁                ▃
  ███▆███▇████▆▆▅▇▆▆▃▅▅▄▇█▇█▅▅▇▆▆▅▅▄▁▁▄▄▁▃▄▃▃▄▅▁▁▁▁▄▁▃▅▃▁▁▁▄ ▇
  3.87 ms      Histogram: log(frequency) by time     4.34 ms <
 Memory estimate: 544 bytes, allocs estimate: 10.
BenchmarkTools.Trial: 1390 samples with 1 evaluation.
 Range (min … max):  3.563 ms …  4.238 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     3.569 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   3.596 ms ± 74.865 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%
  █▂  ▄▃▁
  ██▇████▇█▇▇█▇▄▆▆▇▆▅▅▆▇▄▇▅▄▅▃▅▃▃▅▁▁▃▃▁▁▁▃▁▃▁▁▃▁▃▁▁▁▃▁▃▁▁▄▁▄ ▇
  3.56 ms      Histogram: log(frequency) by time     4.03 ms <
 Memory estimate: 0 bytes, allocs estimate: 0.

This branch:

BenchmarkTools.Trial: 1690 samples with 1 evaluation.
 Range (min … max):  2.874 ms …  3.264 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     2.965 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.957 ms ± 40.091 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%
                    ▂▄          █▁▃
  ▂▂▂▁▂▄▆▃▃▄▄▄▄▅▄▃▆▄██▆▃▃▃▂▃▅▄▂▅███▄▃▃▃▄▄▇█▃▅▆▄▅▃▂▃▂▂▂▁▁▁▁▁▁ ▃
  2.87 ms        Histogram: frequency by time        3.05 ms <
 Memory estimate: 544 bytes, allocs estimate: 10.
BenchmarkTools.Trial: 1817 samples with 1 evaluation.
 Range (min … max):  2.626 ms …  3.289 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     2.735 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.750 ms ± 69.871 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%
            ▄  ▃▄█▅▁ ▁▁▅  
  ▁▂▃▂▂▅▄▅▅▇██▇██████████▆▅▅▇▇▅▃▅▆▆▅▃▄▅▄▅▅▃▂▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂ ▃
  2.63 ms        Histogram: frequency by time        2.94 ms <
 Memory estimate: 0 bytes, allocs estimate: 0.