ali-ramadhan / Atmosfoolery.jl

Compressible non-hydrostatic model built on top of Oceananigans.jl so it runs on CPUs and GPUs
MIT License
3 stars 0 forks source link

GPU support via KernelAbstractions.jl #91

Closed ali-ramadhan closed 3 years ago

ali-ramadhan commented 3 years ago

This PR converts the time stepping kernels to use KernelAbstractions.jl to enable GPU support. Regression tests seem to pass but difference in numbers is a little high on some configurations so probably worth investigating a bit.

Preliminary benchmarks (below) show pretty strong speedups :tada: Energy on GPU seems slower than it should be and it seems there's a big hit for using 3 gases. Might be things we can easily target for performance optimization though. Not sure if Entropy + 1 gas is actually this much faster but seems like an anomaly.

CPU -> GPU speedup for large (256^3) models
Entropy + 1 gas: 287x
Energy  + 1 gas: 86x
Entropy + 3 gas: 58x
Energy  + 3 gas: 85x

 ────────────────────────────────────────────────────────────────────────────────────────────────
           Static atmosphere benchmarks                  Time                   Allocations      
                                                 ──────────────────────   ───────────────────────
                Tot / % measured:                     1296s / 73.6%           51.0GiB / 0.48%    

 Section                                 ncalls     time   %tot     avg     alloc   %tot      avg
 ────────────────────────────────────────────────────────────────────────────────────────────────
 256×256×256 [CPU, Energy, DryEarth3]        10     275s  28.8%   27.5s   7.97MiB  3.20%   817KiB
 256×256×256 [CPU, Energy, DryEarth]         10     178s  18.7%   17.8s   6.31MiB  2.54%   646KiB
 256×256×256 [CPU, Entropy, DryEarth3]       10     336s  35.2%   33.6s   7.97MiB  3.20%   817KiB
 256×256×256 [CPU, Entropy, DryEarth]        10     151s  15.9%   15.1s   6.31MiB  2.54%   646KiB
 256×256×256 [GPU, Energy, DryEarth3]        10    3.24s  0.34%   324ms   29.8MiB  12.0%  2.98MiB
 256×256×256 [GPU, Energy, DryEarth]         10    2.06s  0.22%   206ms   22.4MiB  9.00%  2.24MiB
 256×256×256 [GPU, Entropy, DryEarth3]       10    5.82s  0.61%   582ms   26.3MiB  10.6%  2.63MiB
 256×256×256 [GPU, Entropy, DryEarth]        10    527ms  0.06%  52.7ms   22.4MiB  9.00%  2.24MiB
 32×32×32 [CPU, Energy, DryEarth3]           10    502ms  0.05%  50.2ms   7.97MiB  3.20%   817KiB
 32×32×32 [CPU, Energy, DryEarth]            10    354ms  0.04%  35.4ms   6.31MiB  2.54%   646KiB
 32×32×32 [CPU, Entropy, DryEarth3]          10    665ms  0.07%  66.5ms   7.97MiB  3.20%   817KiB
 32×32×32 [CPU, Entropy, DryEarth]           10    304ms  0.03%  30.4ms   6.31MiB  2.54%   646KiB
 32×32×32 [GPU, Energy, DryEarth3]           10   82.8ms  0.01%  8.28ms   25.4MiB  10.2%  2.54MiB
 32×32×32 [GPU, Energy, DryEarth]            10   68.3ms  0.01%  6.83ms   20.0MiB  8.05%  2.00MiB
 32×32×32 [GPU, Entropy, DryEarth3]          10   95.3ms  0.01%  9.53ms   25.4MiB  10.2%  2.54MiB
 32×32×32 [GPU, Entropy, DryEarth]           10   73.5ms  0.01%  7.35ms   20.0MiB  8.05%  2.00MiB
 ────────────────────────────────────────────────────────────────────────────────────────────────
ali-ramadhan commented 3 years ago

Hmmm, CPU regression changed slightly but all differences look tiny. One of the important differences to Julia is that floating-point multiplication, addition, subtraction are marked contractable so the answer could change slightly: https://juliagpu.gitlab.io/KernelAbstractions.jl/#Important-differences-to-Julia-1

Maybe there are some zeros in the answer which returns false for.

ali-ramadhan commented 3 years ago

Yeah so our regression tests are horrible lol, at least for ρu. The other fields are much better. I'll change back to an absolute tolerance.

image

codecov[bot] commented 3 years ago

Codecov Report

Merging #91 into master will decrease coverage by 6.17%. The diff coverage is 75.75%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #91      +/-   ##
==========================================
- Coverage   95.12%   88.95%   -6.18%     
==========================================
  Files          12       12              
  Lines         328      362      +34     
==========================================
+ Hits          312      322      +10     
- Misses         16       40      +24     
Impacted Files Coverage Δ
src/time_stepping_kernels.jl 72.09% <66.66%> (-27.91%) :arrow_down:
src/thermodynamics.jl 100.00% <100.00%> (ø)
src/time_stepping.jl 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 962e0f3...8272cf5. Read the comment docs.

ali-ramadhan commented 3 years ago

Looks like WENO-5 works on GPUs! Coarse 2D model isn't impressive lol but good enough sign for me to merge this PR.

thermal_bubble_Energy

Not sure what's up with the flickering. Will fix.