Closed charleskawczynski closed 8 months ago
Attention: 15 lines
in your changes are missing coverage. Please review.
Comparison is base (
805d172
) 92.91% compared to head (ecfb71d
) 92.96%.
Files | Patch % | Lines |
---|---|---|
src/config_numerical_method.jl | 65.00% | 7 Missing :warning: |
src/relations.jl | 96.57% | 6 Missing :warning: |
src/states.jl | 94.28% | 2 Missing :warning: |
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
I took a look at the buoyancy kernel, and I'm seeing a speedup on this branch (on the CPU):
Main branch
******************* thermo_kernel
BenchmarkTools.Trial: 236 samples with 1 evaluation.
Range (min … max): 20.429 ms … 28.884 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 20.910 ms ┊ GC (median): 0.00%
Time (mean ± σ): 21.210 ms ± 876.287 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▄▃█▆▁
▃▄███████▅▄▃▄▅▂▃▄▃▁▂▂▁▂▃▃▄▃▄▂▂▃▂▃▃▃▂▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂ ▃
20.4 ms Histogram: frequency by time 24.8 ms <
Memory estimate: 0 bytes, allocs estimate: 0.
******************* buoyancy_gradients_kernel
BenchmarkTools.Trial: 27 samples with 1 evaluation.
Range (min … max): 180.741 ms … 196.250 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 183.876 ms ┊ GC (median): 0.00%
Time (mean ± σ): 185.915 ms ± 4.569 ms ┊ GC (mean ± σ): 0.00% ± 0.00%
▃ ▃ ██▃
▇█▁▁▇▁█▁▁▁███▇▇▇▇▁▁▇▁▇▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▇▁▇▁▁▇▁▇▁▁▁▁▁▁▁▇▁▁▁▁▁▇▁▇ ▁
181 ms Histogram: frequency by time 196 ms <
Memory estimate: 16 bytes, allocs estimate: 1.
This branch:
******************* thermo_kernel
BenchmarkTools.Trial: 368 samples with 1 evaluation.
Range (min … max): 13.501 ms … 14.498 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 13.544 ms ┊ GC (median): 0.00%
Time (mean ± σ): 13.592 ms ± 143.209 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▁██▇▆▅▆▄▃▃▂▁
████████████▇▆▁▁▄▄▁▁▁▆▁▇▄▁▆▁▁▆▄▁▁▁▁▄▄▁▄▁▄▄▁▁▁▄▁▁▁▁▁▁▄▆▆▇▆▄▄▆ ▇
13.5 ms Histogram: log(frequency) by time 14.1 ms <
Memory estimate: 0 bytes, allocs estimate: 0.
******************* buoyancy_gradients_kernel
BenchmarkTools.Trial: 36 samples with 1 evaluation.
Range (min … max): 138.371 ms … 142.465 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 139.539 ms ┊ GC (median): 0.00%
Time (mean ± σ): 139.659 ms ± 890.868 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▂ █▂ ▂
▅▁▅▁▅▅█▅▅▁▅▅▁█▅▅▅██▁▁█▁▅▅▁▅▅▁▁▁▁▁▁▁▅▁▅▁▁▁▁▁▁▁▁▅▁▁▁▅▁▁▁▁▁▁▁▁▁▅ ▁
138 ms Histogram: frequency by time 142 ms <
Memory estimate: 16 bytes, allocs estimate: 1.
I'm not sure if the input parameters are performing saturation adjustment, but it doesn't matter in the case of the buoyancy gradients kernel, since SA is never called regardless of what the inputs are.
cc @glwagner, @szy21. I'm comfortable moving forward with this.
Interestingly, I didn't realize this until now, it looks like the buoyancy kernel is allocating 🤔
This looks great! So maybe it will speed up the quadratures also...
This looks great! So maybe it will speed up the quadratures also...
Yeah, I was thinking that, too
Good news.
Also, from the updated benchmark, which I think may reflect the impact better for the CPU (we won't see differences on the gpu where we use always_inline = true
):
main
BenchmarkTools.Trial: 1273 samples with 1 evaluation.
Range (min … max): 3.875 ms … 4.541 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 3.881 ms ┊ GC (median): 0.00%
Time (mean ± σ): 3.928 ms ± 93.551 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
█▃▁ ▃▃▁ ▃
███▆███▇████▆▆▅▇▆▆▃▅▅▄▇█▇█▅▅▇▆▆▅▅▄▁▁▄▄▁▃▄▃▃▄▅▁▁▁▁▄▁▃▅▃▁▁▁▄ ▇
3.87 ms Histogram: log(frequency) by time 4.34 ms <
Memory estimate: 544 bytes, allocs estimate: 10.
BenchmarkTools.Trial: 1390 samples with 1 evaluation.
Range (min … max): 3.563 ms … 4.238 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 3.569 ms ┊ GC (median): 0.00%
Time (mean ± σ): 3.596 ms ± 74.865 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
█▂ ▄▃▁
██▇████▇█▇▇█▇▄▆▆▇▆▅▅▆▇▄▇▅▄▅▃▅▃▃▅▁▁▃▃▁▁▁▃▁▃▁▁▃▁▃▁▁▁▃▁▃▁▁▄▁▄ ▇
3.56 ms Histogram: log(frequency) by time 4.03 ms <
Memory estimate: 0 bytes, allocs estimate: 0.
This branch:
BenchmarkTools.Trial: 1690 samples with 1 evaluation.
Range (min … max): 2.874 ms … 3.264 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 2.965 ms ┊ GC (median): 0.00%
Time (mean ± σ): 2.957 ms ± 40.091 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▂▄ █▁▃
▂▂▂▁▂▄▆▃▃▄▄▄▄▅▄▃▆▄██▆▃▃▃▂▃▅▄▂▅███▄▃▃▃▄▄▇█▃▅▆▄▅▃▂▃▂▂▂▁▁▁▁▁▁ ▃
2.87 ms Histogram: frequency by time 3.05 ms <
Memory estimate: 544 bytes, allocs estimate: 10.
BenchmarkTools.Trial: 1817 samples with 1 evaluation.
Range (min … max): 2.626 ms … 3.289 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 2.735 ms ┊ GC (median): 0.00%
Time (mean ± σ): 2.750 ms ± 69.871 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▄ ▃▄█▅▁ ▁▁▅
▁▂▃▂▂▅▄▅▅▇██▇██████████▆▅▅▇▇▅▃▅▆▆▅▃▄▅▄▅▅▃▂▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂ ▃
2.63 ms Histogram: frequency by time 2.94 ms <
Memory estimate: 0 bytes, allocs estimate: 0.
This PR attempts to systematically inline all thermo functions, to see the impact of performance.