Open glwagner opened 1 month ago
It looks like @vchuravy had a solution for it, which hopefully will come online in julia 1.11 https://github.com/JuliaGPU/GPUCompiler.jl/pull/557#issuecomment-2183674470
However, we should really try to understand the problem with our precompilation.
Solution for which part?
for the precompilation of ClimaOcean. It looks like the time step does not precompile until the fourth execution so that might be the allocation. If you exclude the first 10 time steps does the time step continue allocating?
for the precompilation of ClimaOcean.
Interesting. I wasn't even timing that.
If you exclude the first 10 time steps does the time step continue allocating?
Yes for sure, check out the benchmark. I'm running 100 time steps.
The constructor time is dominated by constructing OceanSeaIceSurfaceFluxes
. When this is omitted the construction time drops from minutes to less than a second.
Here's a little more information about constructor times for OceanSeaIceSurfaceFluxes
:
SimilarityTheoryTurbulentFluxes
, total_fluxes
, and surface_atmosphere_state
SimilarityTheoryTurbulentFluxes
total_ocean_fluxes
(which includes creating a few BinaryOperation
--- creates 2 new 2D fields plus extracting the existing fields for velocity/tracer fluxes)BinaryOperation
(supposed to be a user convenience) from total_ocean_fluxes
It doesn't take 35 s to create 8 2D fields so the cost has something to do with building the struct. I don't completely understand.
I redid the tests for fun on my laptop, nothing fast and found the following timings:
[ Info: Grid / bathymetry construction time: 4.783 minutes
[ Info: Ocean simulation construction time: 4.783 minutes
[ Info: Atmosphere construction time: 7.542 seconds
[ Info: Coupled model construction time: 38.721 seconds
[ Info: One time step time: 13.587 seconds
Clearly things are even worst for me but I also find the Grid and Ocean model are the slwo parts.
Huh, do you mean. you used a GPU or the laptop CPU?
This is my laptop GPU. Not a powerful one for sure.
It's good that the example even fits on it! How much memory does it have? Still useful for evaluating compile time and parameter space issues, perhaps.
It's interesting that on your machine the model construction is much faster than on the machine I tested on. Still confused why this is happening. I was running on julia 1.10.0, I'll test other julia versions.
Here with julia 1.10.4 with a slightly modified script taht also takes 10 time steps:
[ Info: Time for packages to load: 7.094 seconds
[ Info: Time to construct the ImmersedBoundaryGrid with realistic bathymetry: 2.040 minutes
[ Info: Time to build the ocean simulation: 17.114 minutes
[ Info: Time to build the atmosphere and radiation: 11.529 seconds
[ Info: Time to construct the OceanSeaIceModel: 4.772 minutes
19.544141 seconds (15.39 M allocations: 1.062 GiB, 2.23% gc time, 92.16% compilation time)
154.613907 seconds (26.05 M allocations: 1.262 GiB, 0.28% gc time, 96.44% compilation time)
0.020572 seconds (44.74 k allocations: 16.715 MiB)
0.023772 seconds (44.74 k allocations: 16.715 MiB)
0.023835 seconds (44.74 k allocations: 16.715 MiB)
0.023758 seconds (44.74 k allocations: 16.715 MiB)
0.023896 seconds (44.74 k allocations: 16.715 MiB)
0.081326 seconds (44.74 k allocations: 16.715 MiB, 72.32% gc time)
0.018894 seconds (44.74 k allocations: 16.715 MiB)
0.019408 seconds (44.74 k allocations: 16.715 MiB)
[ Info: Time to take 10 time-steps: 2.908 minutes
It's good that the example even fits on it! How much memory does it have? Still useful for evaluating compile time and parameter space issues, perhaps.
It's interesting that on your machine the model construction is much faster than on the machine I tested on. Still confused why this is happening. I was running on julia 1.10.0, I'll test other julia versions.
I have gone as high as 18GB on my laptop GPU before it gave up and said no!
I should say that I was using 1.10.0.
I am happy to try another version of Julia if that's of interest.
Here with julia 1.10.4 with a slightly modified script taht also takes 10 time steps:
[ Info: Time for packages to load: 7.094 seconds [ Info: Time to construct the ImmersedBoundaryGrid with realistic bathymetry: 2.040 minutes [ Info: Time to build the ocean simulation: 17.114 minutes [ Info: Time to build the atmosphere and radiation: 11.529 seconds [ Info: Time to construct the OceanSeaIceModel: 4.772 minutes 19.544141 seconds (15.39 M allocations: 1.062 GiB, 2.23% gc time, 92.16% compilation time) 154.613907 seconds (26.05 M allocations: 1.262 GiB, 0.28% gc time, 96.44% compilation time) 0.020572 seconds (44.74 k allocations: 16.715 MiB) 0.023772 seconds (44.74 k allocations: 16.715 MiB) 0.023835 seconds (44.74 k allocations: 16.715 MiB) 0.023758 seconds (44.74 k allocations: 16.715 MiB) 0.023896 seconds (44.74 k allocations: 16.715 MiB) 0.081326 seconds (44.74 k allocations: 16.715 MiB, 72.32% gc time) 0.018894 seconds (44.74 k allocations: 16.715 MiB) 0.019408 seconds (44.74 k allocations: 16.715 MiB) [ Info: Time to take 10 time-steps: 2.908 minutes
So, there is a consistent 16.7 MiB allocation per time step. That is indeed a bit worrying if we have to spend 72% of the time in GC every 10ish time steps
On julia 1.11.0-rc2:
greg@tartarus:~/Projects/ClimaOcean.jl/test$ julia +1.11 --project
_
_ _ _(_)_ | Documentation: https://docs.julialang.org
(_) | (_) (_) |
_ _ _| |_ __ _ | Type "?" for help, "]?" for Pkg help.
| | | | | | |/ _` | |
| | |_| | | | (_| | | Version 1.11.0-rc2 (2024-07-29)
_/ |\__'_|_|_|\__'_| | Official https://julialang.org/ release
|__/ |
julia> include("test_ocean_sea_ice_model_parameter_space.jl")
Precompiling ClimaOcean...
3 dependencies successfully precompiled in 20 seconds. 265 already precompiled.
[ Info: Time for packages to load: 29.139 seconds
[ Info: Regridding bathymetry from existing file ./ETOPO_2022_v1_60s_N90W180_surface.nc.
[ Info: Time to construct the ImmersedBoundaryGrid with realistic bathymetry: 1.993 minutes
[ Info: Time to build the ocean simulation: 15.823 minutes
[ Info: Time to build the atmosphere and radiation: 12.646 seconds
[ Info: Time to construct the OceanSeaIceModel: 3.819 minutes
26.515157 seconds (25.42 M allocations: 1.297 GiB, 1.32% gc time, 93.56% compilation time)
195.530043 seconds (28.53 M allocations: 1.263 GiB, 0.41% gc time, 96.92% compilation time)
0.027841 seconds (48.63 k allocations: 16.697 MiB)
0.732518 seconds (48.63 k allocations: 16.697 MiB, 97.40% gc time)
0.019094 seconds (48.63 k allocations: 16.697 MiB)
0.018940 seconds (48.63 k allocations: 16.697 MiB)
0.018603 seconds (48.63 k allocations: 16.697 MiB)
0.018552 seconds (48.63 k allocations: 16.697 MiB)
0.033975 seconds (48.63 k allocations: 16.697 MiB, 47.97% gc time)
0.017218 seconds (48.63 k allocations: 16.697 MiB)
[ Info: Time to take 10 time-steps: 3.718 minutes
15.8 minutes to build the ocean simulations is pretty wild.
I timed how long it takes to build and then take one time step with
OceanSeaIceModel
with this script:Running for the first time I get (ignoring the annoying warnings mentioned on #133):
The 6-minute wait time for model construction isn't alleviated until the 5th or 6th time building a model.
After the time-stepping is compiled, one time-step is considerably shorter:
It's not obvious to me why model construction is so expensive. We do call
update_state!
within the model constructor, which computes fluxes. But this also has to be called duringtime_step!
, which is cheap. So there's something else going on.Finally,
time_step!
seems to allocate:which is also problematic.