Open simone-silvestri opened 2 months ago
If we want to make ocean_simulations
as close as oceananigans as possible probably option 2 is the best
I'd rank 1 -> 2 -> 3
If we want to make
ocean_simulations
as close as oceananigans as possible probably option 2 is the best
1 or 2; depending how you measure "close". Close having the same kwarg name or also having the same defaults and what needs to be explicitly denoted.
I prefer the first option.
Just to be a little more explicit, the differences are illustrated by considering syntax two cases: a) a simulation with T, S and b) a simulation with biogeochemistry and additional tracers. Here's how it shakes out:
Option 1:
ocean = ocean_simulation(grid)
ocean = ocean_simulation(grid; biogeochemistry, tracers = (:T, :S, biogeochemical_tracers...))
Option 2:
ocean = ocean_simulation(grid, tracers=(:T, :S))
ocean = ocean_simulation(grid; biogeochemistry, tracers = (:T, :S, biogeochemical_tracers...))
Option 3:
ocean = ocean_simulation(grid)
ocean = ocean_simulation(grid; biogeochemistry, user_defined_tracers = biogeochemical_tracers)
I would prefer "additional" or "extra" rather than "user_defined". I find it strange when the word "user" appears in an interface, because the "user" is the one writing the code, so it's like they are talking to themselves.
Here is my take on the pros and cons:
Option 1: Pros: simple and explicit for complicated cases with more cases. Cons: implicit default.
Option 2: Pros: most explicit. Cons: adds the boilerplate tracers = (:T, :S)
in the case that extra tracers / biogeochemistry are relatively rare additions to a model.
Option 3: Pros: has the least boilerplate, in the case that adding tracers is very common / typical. Cons: its the most implicit.
Option 1 strikes a balance because it eliminates boilerplate for the "no additional tracers" case (which I suspect will be the most common case), while retaining a nicely explicit interface for "non-standard" simulations like those which add tracers / have biogeochemistry. I believe this is a well-considered balance and that's why I prefer option 1.
I prefer the first option.
Just to be a little more explicit, the differences are illustrated by considering syntax two cases: a) a simulation with T, S and b) a simulation with biogeochemistry and additional tracers. Here's how it shakes out:
Option 1:
ocean = ocean_simulation(grid) ocean = ocean_simulation(grid; biogeochemistry, tracers = (:T, :S, biogeochemical_tracers...))
Option 2:
ocean = ocean_simulation(grid, tracers=(:T, :S)) ocean = ocean_simulation(grid; biogeochemistry, tracers = (:T, :S, biogeochemical_tracers...))
Option 3:
ocean = ocean_simulation(grid) ocean = ocean_simulation(grid; biogeochemistry, user_defined_tracers = biogeochemical_tracers)
I would prefer "additional" or "extra" rather than "user_defined". I find it strange when the word "user" appears in an interface, because the "user" is the one writing the code, so it's like they are talking to themselves.
I think @simone-silvestri's Option 3 would add the required biogeochemical tracers from the biogeochemical model inside the function, so then the syntax would be just this unless extra passive tracers are added?
ocean = ocean_simulation(grid)
ocean = ocean_simulation(grid; biogeochemistry)
This had been my initial intuition for what would be simplest when adding biogeochemistry, because it saves a line of code for extracting the biogeochemical tracers and one kwarg, but I see the downside of it being implicit and if biogeochemistry/added tracers is a 'non=standard' use case then it doesn't make sense to optimise for that. I defer to you all as developers anyway!
Ok, so option one seems to be prevalent. Last issue is :e
should that be in the default tracers given that CATKE is the default closure?
In the latter case, should we warn people that want to use a different closure that they are carrying with them and additional tracer?
options: 1.) e is a default tracer
ocean = ocean_simulation(grid)
ocean = ocean_simulation(grid; tracers = (:T, :S), closure = other_closure) # If not passing tracers warn here!
2.) e is not a default tracer
ocean = ocean_simulation(grid; tracers = (:T, :S, :e))
ocean = ocean_simulation(grid; closure = other_closure)
There is always the option to add :e
automatically with a warning but at that point we can do the same for :T
and :S
I vote to make e
a default tracer for the time being. This will promote understanding and awareness of the tracer e
.
I think we can also consider taking the stance both here and in Oceananigans
that the "closure tracers" have a different status than biogeochemical tracers, because unlike other tracers, typical users will not be changing initial conditions and/or seting boundary conditions on closure tracers. (One might argue that e
is not a "true" tracer as long as we are neglecting advection, too... despite that it is categorized under model.tracers
.) With that policy, we could motivate a tracers = default_tracers(closure)
which depends on the closure
.
Separately, a "warning" is not appropriate in this case, because warnings should be reserved for situations where incorrect behavior may occur, or something can "go wrong". Merely having more tracers than needed is not such a case. Also, if we think that "info" is not urgent enough for performance issues, we can also take advantage of Oceananigans' custom logger to create a new log level like @perf
to use for possible performance pitfalls, that should be elevated above @info
but don't rise to the status of a @warn
.
Also separately, we should in fact be printing status messages that show the progress of the construction of the simulation. A similar approach has been prototyped over in the LESbrary:
where a user might see something like
julia> include("run_simulation.jl")
[ Info: Mapping grid...
[ Info: Enforcing boundary conditions...
[ Info: Forcing and sponging tracers...
[ Info: Whipping up the Stokes drift...
[ Info: Air uβ
: 0.5636, water uβ
: 0.0195, Ξ»α΅: 243.3854, La: 0.481, Surface Stokes drift: 0.0843 m sβ»ΒΉ
[ Info: Framing the model...
[ Info: Built model:
β Info: NonhydrostaticModel{CPU, RectilinearGrid}(time = 0 seconds, iteration = 0)
β βββ grid: 32Γ32Γ32 RectilinearGrid{Float64, Periodic, Periodic, Bounded} on CPU with 5Γ5Γ5 halo
β βββ timestepper: RungeKutta3TimeStepper
β βββ advection scheme: WENO reconstruction order 9
β βββ tracers: (b, c)
β βββ closure: Nothing
β βββ buoyancy: BuoyancyTracer with gΜ = NegativeZDirection()
β βββ coriolis: FPlane{Float64}(f=0.0001)
[ Info: Setting initial conditions...
[ Info: Conjuring the simulation...
[ Info: Squeezing out statistics...
[ Info: Garnishing output writers...
[ Info: - with fields: (:u, :v, :w, :b, :c, :e)
[ Info: - with statistics: (:u, :v, :b, :c)
Note that info messages and warnings are not complete solutions to an "information issue" alone, because many people will encounter code only by reading it rather than running it.
PR #123 introduces a keyword argument
tracers
to pass toocean_simulation
. There are some nuances in passing this keyword argument that would be nice to discuss. In particular, the decision is how are we handling active and passive tracers, or in general handling tracers that are required for different modeling choices. These include::T
and:S
for density calculation:e
for CATKE vertical parameterizationFor the moment, active tracers
:T
and:S
are assumed to be required, and, they are "silently" added if missing from thetracers
keyword argument. The same happens if using CATKE: the:e
tracer is "silently" added. The main issue brought up using this method is that we want to be "explicit" in what tracers are being passed to the simulation and thattracers
should include all tracers. We could have other options:Option one
add a
tracers
kwargs inocean_simulation
that defaults to(:T, :S, :e)
if no tracers are specified. This has the advantage of being explicit (most of the times) but the disadvantage of leading to "awkward" scripting outcomes. An example is writing an ocean simulation without biogeochemistryand having to modify the script as
when adding biogeochemistry. i.e. adding biogeochemistry requires a seemingly unrelated change to the arguments of
ocean_simulation
: adding T, S, and e as tracers that were not required before. The deficiencies here are that the tracers are sometimes explicit and sometimes implied. In my opinion this leaves us in a weird "in-between" situation that probably requires users to hit an argument error before understanding that:T
and:S
have to be explicitly included when adding seemingly unrelated arguments.Option two
add a
tracers
kwarg inocean_simulation
without any defaults. This is the most "explicit" option for users and also probably the cleanest solution. The downside here is that boundary conditions are specified automatically for:T
and:S
and not for other tracers, but that is what we also do with:e
in Oceananigans.Option three
Add a
user_defined_tracers
kwarg where users add all the passive tracers they want and automatically add the required tracers inside theocean_simulation
(i.e. check the buoyancy model and biogeochemistry model and make sure all required tracers are added). The main downside of this option is that users could specify reserved tracer names:T
and:S
or:e
as arguments and this option is also less explicit.Let me know what you think. I would rank the options as
3 -> 2 -> 1
but I m open to different solutions