Closed charleskawczynski closed 1 month ago
cc @Sbozzolo, @szy21
Wow! That's dramatic! Thanks for finding this. Could you run a test where you use the HDF5 writer instead? The NetCDF writer is much more complex (and inherits type instablity from NCDatasets). That would already give us a sense of where to look.
Do you also know if this is inference or LLVM time?
I know that the loops in orchestrate_diagnostics
cause a lot of compiler allocations. I tried unrolling them and it led to a massive explosion in compile time.
If changing writer to HDF5 improves things, we can restructure things to store symbols instead of references for the entire object. That would simplify the types quite a lot and we can re-evalute unrolling at that point.
Yeah, it's pretty dramatic, I have a feeling that it's a combination of things, and yes, we can try with HDF5.
I haven't narrowed down between inference / LLVM time.
Here are some tests I performed. I took a baroclinc wave and added a bunch of diagnostics like this:
- short_name: [pfull, ua, wa, va, rv, ta, ke]
period: 1days
- short_name: [pfull, ua, wa, va, rv, ta, ke]
period: 2days
- short_name: [pfull, ua, wa, va, rv, ta, ke]
period: 3days
- short_name: [pfull, ua, wa, va, rv, ta, ke]
....
in the config.
I tested the case with NetCDF and HDF5 writers, and listed below is approximately the time to compile step
:
NetCDF: 0 diagnostics: 32 seconds 10 diagnostics: 32 seconds 20 diagnostics: 36 seconds 40 diagnostics: 42 seconds 60 diagnostics: 48 seconds
HDF5: 50 with HDF5: 33 seconds 150 with HDF5: 38 seconds
Next, I tried turning the diagnostics from a tuple to vector, these are the new times
NetCDF: 50 diagnostics: 30 seconds 150 diagnostics: 33 seconds
HDF5: 50 diagnostics: 27 seconds 150 diagnostics: 38 seconds
It seem that ClimaDiagnostics may be largely responsible for the long compile times. This may be related to #82, or complex type signatures + https://github.com/JuliaLang/julia/issues/55807, or a combination of all three. This job in ClimaAtmos took ~98 minutes to compile the driver up to the end of the first call to
step!(integrator)
with diagnostics, and only 24 without. So, adding diagnostics increases compile times by 4x. Here is a reproducer in ClimaAtmos: