Handling active and passive tracers in `ocean_simulation`

simone-silvestri commented 2 months ago

PR #123 introduces a keyword argument tracers to pass to ocean_simulation. There are some nuances in passing this keyword argument that would be nice to discuss. In particular, the decision is how are we handling active and passive tracers, or in general handling tracers that are required for different modeling choices. These include:

:T and :S for density calculation
:e for CATKE vertical parameterization
biogeochemical tracers for biogeochemistry
other active tracers
other passive tracers

For the moment, active tracers :T and :S are assumed to be required, and, they are "silently" added if missing from the tracers keyword argument. The same happens if using CATKE: the :e tracer is "silently" added. The main issue brought up using this method is that we want to be "explicit" in what tracers are being passed to the simulation and that tracers should include all tracers. We could have other options:

Option one

add a tracers kwargs in ocean_simulation that defaults to (:T, :S, :e) if no tracers are specified. This has the advantage of being explicit (most of the times) but the disadvantage of leading to "awkward" scripting outcomes. An example is writing an ocean simulation without biogeochemistry

ocean = ocean_simulation(grid)

and having to modify the script as

ocean = ocean_simulation(grid; biogeochemistry, tracers = (:T, :S, :e, biogeochemical_tracers...))

when adding biogeochemistry. i.e. adding biogeochemistry requires a seemingly unrelated change to the arguments of ocean_simulation: adding T, S, and e as tracers that were not required before. The deficiencies here are that the tracers are sometimes explicit and sometimes implied. In my opinion this leaves us in a weird "in-between" situation that probably requires users to hit an argument error before understanding that :T and :S have to be explicitly included when adding seemingly unrelated arguments.

Option two

add a tracers kwarg in ocean_simulation without any defaults. This is the most "explicit" option for users and also probably the cleanest solution. The downside here is that boundary conditions are specified automatically for :T and :S and not for other tracers, but that is what we also do with :e in Oceananigans.

Option three

Add a user_defined_tracers kwarg where users add all the passive tracers they want and automatically add the required tracers inside the ocean_simulation (i.e. check the buoyancy model and biogeochemistry model and make sure all required tracers are added). The main downside of this option is that users could specify reserved tracer names :T and :S or :e as arguments and this option is also less explicit.

Let me know what you think. I would rank the options as 3 -> 2 -> 1 but I m open to different solutions

simone-silvestri commented 2 months ago

If we want to make ocean_simulations as close as oceananigans as possible probably option 2 is the best

navidcy commented 2 months ago

I'd rank 1 -> 2 -> 3

navidcy commented 2 months ago

If we want to make ocean_simulations as close as oceananigans as possible probably option 2 is the best

1 or 2; depending how you measure "close". Close having the same kwarg name or also having the same defaults and what needs to be explicitly denoted.

glwagner commented 2 months ago

I prefer the first option.

Just to be a little more explicit, the differences are illustrated by considering syntax two cases: a) a simulation with T, S and b) a simulation with biogeochemistry and additional tracers. Here's how it shakes out:

Option 1:

ocean = ocean_simulation(grid)
ocean = ocean_simulation(grid; biogeochemistry, tracers = (:T, :S, biogeochemical_tracers...))

Option 2:

ocean = ocean_simulation(grid, tracers=(:T, :S))
ocean = ocean_simulation(grid; biogeochemistry, tracers = (:T, :S, biogeochemical_tracers...))

Option 3:

ocean = ocean_simulation(grid)
ocean = ocean_simulation(grid; biogeochemistry, user_defined_tracers = biogeochemical_tracers)

I would prefer "additional" or "extra" rather than "user_defined". I find it strange when the word "user" appears in an interface, because the "user" is the one writing the code, so it's like they are talking to themselves.

glwagner commented 2 months ago

Here is my take on the pros and cons:

Option 1: Pros: simple and explicit for complicated cases with more cases. Cons: implicit default. Option 2: Pros: most explicit. Cons: adds the boilerplate tracers = (:T, :S) in the case that extra tracers / biogeochemistry are relatively rare additions to a model. Option 3: Pros: has the least boilerplate, in the case that adding tracers is very common / typical. Cons: its the most implicit.

Option 1 strikes a balance because it eliminates boilerplate for the "no additional tracers" case (which I suspect will be the most common case), while retaining a nicely explicit interface for "non-standard" simulations like those which add tracers / have biogeochemistry. I believe this is a well-considered balance and that's why I prefer option 1.

vtamsitt commented 2 months ago

I prefer the first option.

Just to be a little more explicit, the differences are illustrated by considering syntax two cases: a) a simulation with T, S and b) a simulation with biogeochemistry and additional tracers. Here's how it shakes out:

Option 1:
ocean = ocean_simulation(grid)
ocean = ocean_simulation(grid; biogeochemistry, tracers = (:T, :S, biogeochemical_tracers...))
Option 2:
ocean = ocean_simulation(grid, tracers=(:T, :S))
ocean = ocean_simulation(grid; biogeochemistry, tracers = (:T, :S, biogeochemical_tracers...))
Option 3:
ocean = ocean_simulation(grid)
ocean = ocean_simulation(grid; biogeochemistry, user_defined_tracers = biogeochemical_tracers)
I would prefer "additional" or "extra" rather than "user_defined". I find it strange when the word "user" appears in an interface, because the "user" is the one writing the code, so it's like they are talking to themselves.

I think @simone-silvestri's Option 3 would add the required biogeochemical tracers from the biogeochemical model inside the function, so then the syntax would be just this unless extra passive tracers are added?

 ocean = ocean_simulation(grid)
 ocean = ocean_simulation(grid; biogeochemistry)

This had been my initial intuition for what would be simplest when adding biogeochemistry, because it saves a line of code for extracting the biogeochemical tracers and one kwarg, but I see the downside of it being implicit and if biogeochemistry/added tracers is a 'non=standard' use case then it doesn't make sense to optimise for that. I defer to you all as developers anyway!

simone-silvestri commented 2 months ago

Ok, so option one seems to be prevalent. Last issue is :e should that be in the default tracers given that CATKE is the default closure?

In the latter case, should we warn people that want to use a different closure that they are carrying with them and additional tracer?

options: 1.) e is a default tracer

ocean = ocean_simulation(grid)
ocean = ocean_simulation(grid; tracers = (:T, :S), closure = other_closure) # If not passing tracers warn here!

2.) e is not a default tracer

ocean = ocean_simulation(grid; tracers = (:T, :S, :e))
ocean = ocean_simulation(grid; closure = other_closure)

There is always the option to add :e automatically with a warning but at that point we can do the same for :T and :S

glwagner commented 2 months ago

I vote to make e a default tracer for the time being. This will promote understanding and awareness of the tracer e.

I think we can also consider taking the stance both here and in Oceananigans that the "closure tracers" have a different status than biogeochemical tracers, because unlike other tracers, typical users will not be changing initial conditions and/or seting boundary conditions on closure tracers. (One might argue that e is not a "true" tracer as long as we are neglecting advection, too... despite that it is categorized under model.tracers.) With that policy, we could motivate a tracers = default_tracers(closure) which depends on the closure.

Separately, a "warning" is not appropriate in this case, because warnings should be reserved for situations where incorrect behavior may occur, or something can "go wrong". Merely having more tracers than needed is not such a case. Also, if we think that "info" is not urgent enough for performance issues, we can also take advantage of Oceananigans' custom logger to create a new log level like @perf to use for possible performance pitfalls, that should be elevated above @info but don't rise to the status of a @warn.

Also separately, we should in fact be printing status messages that show the progress of the construction of the simulation. A similar approach has been prototyped over in the LESbrary:

https://github.com/CliMA/LESbrary.jl/blob/145b9598eeb5e2c1df1e324e40f4b46130d5b96f/src/IdealizedExperiments/three_layer_constant_fluxes.jl#L33

where a user might see something like

julia> include("run_simulation.jl")
[ Info: Mapping grid...
[ Info: Enforcing boundary conditions...
[ Info: Forcing and sponging tracers...
[ Info: Whipping up the Stokes drift...
[ Info: Air u★: 0.5636, water u★: 0.0195, λᵖ: 243.3854, La: 0.481, Surface Stokes drift: 0.0843 m s⁻¹
[ Info: Framing the model...
[ Info: Built model:
┌ Info: NonhydrostaticModel{CPU, RectilinearGrid}(time = 0 seconds, iteration = 0)
│ ├── grid: 32×32×32 RectilinearGrid{Float64, Periodic, Periodic, Bounded} on CPU with 5×5×5 halo
│ ├── timestepper: RungeKutta3TimeStepper
│ ├── advection scheme: WENO reconstruction order 9
│ ├── tracers: (b, c)
│ ├── closure: Nothing
│ ├── buoyancy: BuoyancyTracer with ĝ = NegativeZDirection()
└ └── coriolis: FPlane{Float64}(f=0.0001)
[ Info: Setting initial conditions...
[ Info: Conjuring the simulation...
[ Info: Squeezing out statistics...
[ Info: Garnishing output writers...
[ Info:     - with fields: (:u, :v, :w, :b, :c, :e)
[ Info:     - with statistics: (:u, :v, :b, :c)

Note that info messages and warnings are not complete solutions to an "information issue" alone, because many people will encounter code only by reading it rather than running it.

CliMA / ClimaOcean.jl