CliMA / Oceananigans.jl

🌊 Julia software for fast, friendly, flexible, ocean-flavored fluid dynamics on CPUs and GPUs
https://clima.github.io/OceananigansDocumentation/stable
MIT License
999 stars 196 forks source link

Making it easier to evolve tracers *only* after a certain time keeping them "frozen" until then #3154

Closed tomchor closed 1 year ago

tomchor commented 1 year ago

Sometimes it's useful to run a simulation with tracers and have the tracers start evolving only after a certain spin-up period. Since https://github.com/CliMA/Oceananigans.jl/pull/2938 this is possible by running a simulation without tracers, saving a checkpoint file from it, building a new model that is identical to the previous one but with extra tracers, manually set!() the prognostic fields from the checkpoint file, and then run that model. This works as intended, but as I've realized since then it's got a few downsides. In no particular order:

  1. Having to compile two separate models and Simulations can take a long time for small runs. When your code is production-ready and you're running the final big simulations that's okay, since those extra minutes of compilation are small compared to the many hours of run. But when you're doing small exploratory simulations (which may be the majority of times you run your code), then that extra compilation time required to build an extra model, simulation, writers, etc., can increase the code run-time significantly.
  2. Since you end up compiling and running two sets of model and simulation, this ends up taking extra space which can be a problem on the GPU. (I'm not aware of any way to "remove" the old Simulation from the GPU memory, but if there is, then this downside can be negated.)
  3. It makes for more complex code, especially when figuring out when it's safe to pick-up a simulation or not, due to the way pick-ups work.
  4. When outputting to NetCDF you generally need at least two files: one for the fields in the spin-up simulations and one for the extra tracers (or one NetCDF for the spin-up period and another one for the rest of the run). I'm not sure if this is also a limitation of the JLD2 writer though.

I think if we implement a way to build just one model which has every tracer needed and (optionally ofc) specify start times for each tracer (before which the tracers would just not be evolved in time) it would solve all the of the problems above.

The downsides that I can think of are:

  1. Simulations using this feature would possibly waste space on disk by outputting "frozen" tracer fields before they start evolving.
  2. One more thing in the models to test and maintain.

Maybe the biggest issue is that I'm not sure this is something enough people actually want to do with their simulations. This has been a relatively common thing for my research, and I know @whitleyv does this too, but maybe we're the exception?

cc @whitleyv

navidcy commented 1 year ago

What about something like this?

...

model = MyFavModel(..., tracers = :c, ...)

# Don't initialize the tracer, i.e., let c be zero

spinup_time = 10minutes
simulation = Simulation(model, Δt=10.0, stop_time=spinup_time)
run!(simulation)

the_rest_of_the_time = 10days

set!(model, c=initial_c)

simulation.stop_time = spinup_time + the_rest_of_the_time
run!(simulation)

...

You don't need to be outputting c for the spinup period. Only add an output writer for c after the spinup?

navidcy commented 1 year ago

Btw, when I am doing small exploratory runs I only need to "pay compilation time" once per model. Constructing another model in the same REPL session doesn't have any extra compilation costs unless I change something in the source code (Oceananigans source code; node my scripts).

I think my gut feeling would be against implementing such a feature, mostly because of the test and maintain part you mentioned. Also because I can see sort-of-easy ways around it. Perhaps I'm missing something? E.g. I don't really quite understand what do you mean by "safe to pick-up" and "due to the way pick-ups work" in

It makes for more complex code, especially when figuring out when it's safe to pick-up a simulation or not, due to the way pick-ups work.

simone-silvestri commented 1 year ago

I agree with Navid, it is better to have a more complicated script for cases this specific than a complicated source code.

glwagner commented 1 year ago

I agree with @navidcy's points. Of course, it doesn't even matter whether the tracer is initialized or not (referring to the comment in @navidcy's script). The main point is that you can "re-initialize" a state whenever you like. I'd also like to make the extra point that @navidcy's pattern is interpretable and readable. I'm not sure we would achieve the same if we hide such a feature inside the source code. Does @navidcy's suggestion work for you @tomchor ?

I also think it is preferred to use separate files for a situation like this (though it may not be necessary to save intermediate times in the spin-up at all --- so a second file might not be necessary). If the spin up is recorded, I think it's better to use a separate file for its data.

If the spin up is expensive, the following performance optimization could be used (we can leverage the velocities kwarg to save a bit of allocation):

# Spin up with no tracers
model = NonhydrostaticModel(; tracers=nothing, kwargs...)
simulation = Simulation(model, ...) # etc
run!(simulation)

# Run for real, re-using the old `velocities` fields but overwriting the old `model`
model = NonhydrostaticModel(; tracers=:c, velocities=model.velocities, kwargs...)
simulation = Simulation(model, ...) # etc

There is some additional memory allocation for tendencies in this case, however, so it may not work for simulations that push GPU memory.

(I'm not aware of any way to "remove" the old Simulation from the GPU memory, but if there is, then this downside can be negated.)

This occurs automatically with garbage collection, provided that there's no reference to the old simulation in the name space. CUDA may have a way to manually call the garbage collector, after doing something like model=nothing; simulation=nothing. If you figure that out, it'd be nice to know.

It makes for more complex code, especially when figuring out when it's safe to pick-up a simulation or not, due to the way pick-ups work.

I don't follow, but if there's something to improve about picking simulations up we should pursue that.

PS @milankl may be interested in this discussion, because it illustrates the importance of being able to modify tracer fields mid-run rather than requiring that they are initialized when the model is built.

tomchor commented 1 year ago

Thanks, everyone. I agree with the major points here. To answer some specific comments:

Btw, when I am doing small exploratory runs I only need to "pay compilation time" once per model. Constructing another model in the same REPL session doesn't have any extra compilation costs unless I change something in the source code (Oceananigans source code; node my scripts).

This is mostly because the majority of my exploratory runs are run in the GPU, and since I have limited GPU time I try to not leave interactive GPU sessions open. If I unlimited access to a GPU (or in the cases where I can explore on the CPU), then I agree with your point.

I agree with Navid, it is better to have a more complicated script for cases this specific than a complicated source code.

Again, agree. I posted this more because, if this was something a lot of other people were doing, it might be worth to maintain the infrastructure. But since it sounds like that's not the case, then I agree it's best to have complex user scripts and keep the source code simple.

Does @navidcy's suggestion work for you @tomchor ?

Yes, thanks for the suggestion @navidcy. I think this is the next best thing. The one disadvantage for me is that is "wastes" computation advecting tracers in the spin-up, but it has the huge advantage of keeping the source code simple, with also a readable user script.

tomchor commented 1 year ago

Just for future reference, I ended up modifying @navidcy's approach and using a Callback to set the concentrations mid-simulation. It proved to be easier.

glwagner commented 1 year ago

Huh, why is that easier? This judgement might be specific to your use case, so I wouldn't recommend taking that approach in general (for future readers of this issue). There is an important downside: the script is harder to read and interpret. So I'd regard that as less-than-best practice unless there's some specific reason to do it. (One reason could be: many similar simulations are being created using a function.)

glwagner commented 1 year ago

I thought of another way to achieve this performance optimization without any source-code-specific feature. For a simulation without buoyancy, for example, this might work:

# Construct the full model with all desired tracers
model = NonHydrostaticModel(; grid, tracers, ...)

model_properties = []
for name in propertynames(model)
    if name == :tracers
        push!(model_properties, nothing)
    else
        push!(model_properties, getproperty(model, name))
    end
end

# Build a "spin-up" model using the inner constructor for NonhydrostaticModel, with tracers=nothing
spin_up_model = NonhydrostaticModel(model_properties...)

I believe that spin_up_model will run without evolving tracers.

If you require some tracers (ie active tracers like buoyancy) then things are slightly more complicated. You have to replace tracers=nothing with the appropriate NamedTuple. Also, I think you need to ensure that the active tracers are "first" in the list of tracers when the full model is built. Something like

# Construct the full model with all desired tracers
model = NonHydrostaticModel(; grid, tracers=(T, S, c1, c2, c3, c4), ...)

active_tracers = (T = model.tracers.T, S = model.tracers.S)
model_properties = []
for name in propertynames(model)
    if name == :tracers
        push!(model_properties, active_tracers)
    else
        push!(model_properties, getproperty(model, name))
    end
end

# Build a "spin-up" model using the inner constructor for NonhydrostaticModel, with tracers=(; T, S)
spin_up_model = NonhydrostaticModel(model_properties...)
glwagner commented 1 year ago

@tomchor I think it's best to convert this to a discussion. Is that ok with you?

tomchor commented 1 year ago

@tomchor I think it's best to convert this to a discussion. Is that ok with you?

Yup. Go ahead