CliMA / RRTMGP.jl

A Julia implementation of Rapid and accurate Radiative Transfer Model for General Circulation Models.
https://clima.github.io/RRTMGP.jl/latest/
Apache License 2.0
57 stars 4 forks source link

Try using BitArray for cloud masks #451

Open charleskawczynski opened 7 months ago

charleskawczynski commented 7 months ago

I'm curious if this works on GPUs

sriharshakandala commented 7 months ago

Does this improve performance?

charleskawczynski commented 7 months ago

Does this improve performance?

🤷🏻

sriharshakandala commented 7 months ago

Current main:

julia --project=gpuenv test/all_sky_tuning.jl 
device = ClimaComms.CUDADevice(); FT = Float64, ncols = 131658; size per field = 0.04119899868965149 GB
"timing longwave solver" = "timing longwave solver"
  1.159210 seconds (66 CPU allocations: 14.969 KiB)
  1.158549 seconds (65 CPU allocations: 14.891 KiB)
  1.158072 seconds (66 CPU allocations: 15.000 KiB)
  1.157427 seconds (45 CPU allocations: 13.094 KiB)
  1.157513 seconds (45 CPU allocations: 13.094 KiB)
"timing shortwave solver" = "timing shortwave solver"
  0.863498 seconds (51 CPU allocations: 13.469 KiB)
  0.862782 seconds (51 CPU allocations: 13.469 KiB)
  0.863073 seconds (51 CPU allocations: 13.469 KiB)
  0.862254 seconds (51 CPU allocations: 13.469 KiB)
  0.864140 seconds (51 CPU allocations: 13.469 KiB)
 39.751985 seconds (97.94 M allocations: 5.623 GiB, 4.70% gc time, 54.58% compilation time: 1% of which was recompilation)

This branch:

julia --project=gpuenv test/all_sky_tuning.jl 
device = ClimaComms.CUDADevice(); FT = Float64, ncols = 131658; size per field = 0.04119899868965149 GB
"timing longwave solver" = "timing longwave solver"
  1.160132 seconds (66 CPU allocations: 14.969 KiB)
  1.157305 seconds (65 CPU allocations: 14.891 KiB)
  1.156251 seconds (66 CPU allocations: 15.000 KiB)
  1.156213 seconds (45 CPU allocations: 13.094 KiB)
  1.157813 seconds (45 CPU allocations: 13.094 KiB)
"timing shortwave solver" = "timing shortwave solver"
  0.863435 seconds (51 CPU allocations: 13.469 KiB)
  0.861900 seconds (51 CPU allocations: 13.469 KiB)
  0.863441 seconds (51 CPU allocations: 13.469 KiB)
  0.861509 seconds (51 CPU allocations: 13.469 KiB)
  0.863258 seconds (51 CPU allocations: 13.469 KiB)
 39.466558 seconds (97.92 M allocations: 5.624 GiB, 4.58% gc time, 53.51% compilation time: 1% of which was recompilation)