CliMA / ClimaOcean.jl

🌎 Tools for realistic regional-to-global ocean simulations, and coupled ocean + sea-ice simulations based on Oceananigans and ClimaSeaIce. Basis for the ocean and sea-ice component of CliMA's Earth system model.
https://clima.github.io/ClimaOceanDocumentation/dev/
MIT License
26 stars 7 forks source link

Insanely long `OceanSeaIceModel` compile times on GPU #135

Open glwagner opened 1 month ago

glwagner commented 1 month ago

I timed how long it takes to build and then take one time step with OceanSeaIceModel with this script:

using Oceananigans
using ClimaOcean
using OrthogonalSphericalShellGrids

start_time = time_ns()
arch = GPU()
grid = TripolarGrid(arch;
                    size = (50, 50, 10),
                    halo = (7, 7, 7),
                    z = (-6000, 0),
                    first_pole_longitude = 75,
                    north_poles_latitude = 55)

bottom_height = retrieve_bathymetry(grid;
                                    minimum_depth = 10,
                                    dir = "./",
                                    interpolation_passes = 20,
                                    connected_regions_allowed = 0)

grid = ImmersedBoundaryGrid(grid, GridFittedBottom(bottom_height); active_cells_map = true)

elapsed = 1e-9 * (time_ns() - start_time)
@info "Grid / bathymetry construction time: " * prettytime(elapsed)

start_time = time_ns()
free_surface = SplitExplicitFreeSurface(grid; substeps = 20)
ocean = ocean_simulation(grid; free_surface)
model = ocean.model
@info "Ocean simulation construction time: " * prettytime(elapsed)

start_time = time_ns()
backend    = JRA55NetCDFBackend(4)
atmosphere = JRA55_prescribed_atmosphere(arch; backend)
radiation  = Radiation(arch)

elapsed = 1e-9 * (time_ns() - start_time)
@info "Atmosphere construction time: " * prettytime(elapsed)

# Fluxes are computed when the model is constructed, so we just test that this works.
start_time = time_ns()
sea_ice = ClimaOcean.OceanSeaIceModels.MinimumTemperatureSeaIce()
coupled_model = OceanSeaIceModel(ocean, sea_ice; atmosphere, radiation)

elapsed = 1e-9 * (time_ns() - start_time)
@info "Coupled model construction time: " * prettytime(elapsed)

start_time = time_ns()
time_step!(coupled_model, 1)
elapsed = 1e-9 * (time_ns() - start_time)
@info "One time step time: " * prettytime(elapsed)

Running for the first time I get (ignoring the annoying warnings mentioned on #133):

[ Info: Grid / bathymetry construction time: 1.839 minutes
[ Info: Ocean simulation construction time: 1.839 minutes
[ Info: Atmosphere construction time: 11.130 seconds
[ Info: Model construction time: 5.645 minutes
[ Info: One time step time: 17.764 seconds

The 6-minute wait time for model construction isn't alleviated until the 5th or 6th time building a model.

After the time-stepping is compiled, one time-step is considerably shorter:

julia> @time time_step!(coupled_model, 1)
  0.036822 seconds (45.26 k allocations: 16.751 MiB)

It's not obvious to me why model construction is so expensive. We do call update_state! within the model constructor, which computes fluxes. But this also has to be called during time_step!, which is cheap. So there's something else going on.

Finally, time_step! seems to allocate:

julia> @time for n = 1:100; time_step!(coupled_model, 1); end
  2.330006 seconds (4.71 M allocations: 1.741 GiB, 2.54% gc time)

which is also problematic.

simone-silvestri commented 1 month ago

It looks like @vchuravy had a solution for it, which hopefully will come online in julia 1.11 https://github.com/JuliaGPU/GPUCompiler.jl/pull/557#issuecomment-2183674470

However, we should really try to understand the problem with our precompilation.

glwagner commented 1 month ago

Solution for which part?

simone-silvestri commented 1 month ago

for the precompilation of ClimaOcean. It looks like the time step does not precompile until the fourth execution so that might be the allocation. If you exclude the first 10 time steps does the time step continue allocating?

glwagner commented 1 month ago

for the precompilation of ClimaOcean.

Interesting. I wasn't even timing that.

If you exclude the first 10 time steps does the time step continue allocating?

Yes for sure, check out the benchmark. I'm running 100 time steps.

The constructor time is dominated by constructing OceanSeaIceSurfaceFluxes. When this is omitted the construction time drops from minutes to less than a second.

glwagner commented 1 month ago

Here's a little more information about constructor times for OceanSeaIceSurfaceFluxes:

  1. 0.3 s: comment out the creation of SimilarityTheoryTurbulentFluxes, total_fluxes, and surface_atmosphere_state
  2. 3.5 s comment back SimilarityTheoryTurbulentFluxes
  3. 40.7 s: the above plus comment back the creation of the interpolated atmosphere state (new to PR#126 --- creates 8 2D fields)
  4. 296.4 s: comment back the total_ocean_fluxes (which includes creating a few BinaryOperation --- creates 2 new 2D fields plus extracting the existing fields for velocity/tracer fluxes)
  5. 197.9 s: remove the BinaryOperation (supposed to be a user convenience) from total_ocean_fluxes

It doesn't take 35 s to create 8 2D fields so the cost has something to do with building the struct. I don't completely understand.

francispoulin commented 1 month ago

I redid the tests for fun on my laptop, nothing fast and found the following timings:

[ Info: Grid / bathymetry construction time: 4.783 minutes
[ Info: Ocean simulation construction time: 4.783 minutes
[ Info: Atmosphere construction time: 7.542 seconds
[ Info: Coupled model construction time: 38.721 seconds
[ Info: One time step time: 13.587 seconds

Clearly things are even worst for me but I also find the Grid and Ocean model are the slwo parts.

glwagner commented 1 month ago

Huh, do you mean. you used a GPU or the laptop CPU?

francispoulin commented 1 month ago

This is my laptop GPU. Not a powerful one for sure.

glwagner commented 1 month ago

It's good that the example even fits on it! How much memory does it have? Still useful for evaluating compile time and parameter space issues, perhaps.

It's interesting that on your machine the model construction is much faster than on the machine I tested on. Still confused why this is happening. I was running on julia 1.10.0, I'll test other julia versions.

glwagner commented 1 month ago

Here with julia 1.10.4 with a slightly modified script taht also takes 10 time steps:

[ Info: Time for packages to load: 7.094 seconds
[ Info: Time to construct the ImmersedBoundaryGrid with realistic bathymetry: 2.040 minutes
[ Info: Time to build the ocean simulation: 17.114 minutes
[ Info: Time to build the atmosphere and radiation: 11.529 seconds
[ Info: Time to construct the OceanSeaIceModel: 4.772 minutes
 19.544141 seconds (15.39 M allocations: 1.062 GiB, 2.23% gc time, 92.16% compilation time)
154.613907 seconds (26.05 M allocations: 1.262 GiB, 0.28% gc time, 96.44% compilation time)
  0.020572 seconds (44.74 k allocations: 16.715 MiB)
  0.023772 seconds (44.74 k allocations: 16.715 MiB)
  0.023835 seconds (44.74 k allocations: 16.715 MiB)
  0.023758 seconds (44.74 k allocations: 16.715 MiB)
  0.023896 seconds (44.74 k allocations: 16.715 MiB)
  0.081326 seconds (44.74 k allocations: 16.715 MiB, 72.32% gc time)
  0.018894 seconds (44.74 k allocations: 16.715 MiB)
  0.019408 seconds (44.74 k allocations: 16.715 MiB)
[ Info: Time to take 10 time-steps: 2.908 minutes
francispoulin commented 1 month ago

It's good that the example even fits on it! How much memory does it have? Still useful for evaluating compile time and parameter space issues, perhaps.

It's interesting that on your machine the model construction is much faster than on the machine I tested on. Still confused why this is happening. I was running on julia 1.10.0, I'll test other julia versions.

I have gone as high as 18GB on my laptop GPU before it gave up and said no!

I should say that I was using 1.10.0.

I am happy to try another version of Julia if that's of interest.

simone-silvestri commented 1 month ago

Here with julia 1.10.4 with a slightly modified script taht also takes 10 time steps:

[ Info: Time for packages to load: 7.094 seconds
[ Info: Time to construct the ImmersedBoundaryGrid with realistic bathymetry: 2.040 minutes
[ Info: Time to build the ocean simulation: 17.114 minutes
[ Info: Time to build the atmosphere and radiation: 11.529 seconds
[ Info: Time to construct the OceanSeaIceModel: 4.772 minutes
 19.544141 seconds (15.39 M allocations: 1.062 GiB, 2.23% gc time, 92.16% compilation time)
154.613907 seconds (26.05 M allocations: 1.262 GiB, 0.28% gc time, 96.44% compilation time)
  0.020572 seconds (44.74 k allocations: 16.715 MiB)
  0.023772 seconds (44.74 k allocations: 16.715 MiB)
  0.023835 seconds (44.74 k allocations: 16.715 MiB)
  0.023758 seconds (44.74 k allocations: 16.715 MiB)
  0.023896 seconds (44.74 k allocations: 16.715 MiB)
  0.081326 seconds (44.74 k allocations: 16.715 MiB, 72.32% gc time)
  0.018894 seconds (44.74 k allocations: 16.715 MiB)
  0.019408 seconds (44.74 k allocations: 16.715 MiB)
[ Info: Time to take 10 time-steps: 2.908 minutes

So, there is a consistent 16.7 MiB allocation per time step. That is indeed a bit worrying if we have to spend 72% of the time in GC every 10ish time steps

glwagner commented 1 month ago

On julia 1.11.0-rc2:

greg@tartarus:~/Projects/ClimaOcean.jl/test$ julia +1.11 --project
                  _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.11.0-rc2 (2024-07-29)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> include("test_ocean_sea_ice_model_parameter_space.jl")
Precompiling ClimaOcean...
  3 dependencies successfully precompiled in 20 seconds. 265 already precompiled.
[ Info: Time for packages to load: 29.139 seconds
[ Info: Regridding bathymetry from existing file ./ETOPO_2022_v1_60s_N90W180_surface.nc.
[ Info: Time to construct the ImmersedBoundaryGrid with realistic bathymetry: 1.993 minutes
[ Info: Time to build the ocean simulation: 15.823 minutes
[ Info: Time to build the atmosphere and radiation: 12.646 seconds
[ Info: Time to construct the OceanSeaIceModel: 3.819 minutes
 26.515157 seconds (25.42 M allocations: 1.297 GiB, 1.32% gc time, 93.56% compilation time)
195.530043 seconds (28.53 M allocations: 1.263 GiB, 0.41% gc time, 96.92% compilation time)
  0.027841 seconds (48.63 k allocations: 16.697 MiB)
  0.732518 seconds (48.63 k allocations: 16.697 MiB, 97.40% gc time)
  0.019094 seconds (48.63 k allocations: 16.697 MiB)
  0.018940 seconds (48.63 k allocations: 16.697 MiB)
  0.018603 seconds (48.63 k allocations: 16.697 MiB)
  0.018552 seconds (48.63 k allocations: 16.697 MiB)
  0.033975 seconds (48.63 k allocations: 16.697 MiB, 47.97% gc time)
  0.017218 seconds (48.63 k allocations: 16.697 MiB)
[ Info: Time to take 10 time-steps: 3.718 minutes
glwagner commented 1 month ago

15.8 minutes to build the ocean simulations is pretty wild.