Open simonbyrne opened 2 years ago
We definitely do not want to trigger a recompile for every parameter change. Once you have a full ESM, this will become unwieldy. We want parameters to be runtime constants of some form.
In 3. The "extraction" can be automated by tagging parameters? Then do we still need duplicate code, or can we not write the boilerplate in ClimaParameters? e.g. ClimaParameters.create_parameter_struct(component, tags)
You are right, this leads to many structs, but they are organized well: one for each "component" and that struct contains only the parameters used within the component's source code. Once again in totatlity, are we stll not talking (ignoring NN parameters) at 10-100 floats per component and maybe 10-20 components, this doesn't seem too crazy.
In 3. The "extraction" can be automated by tagging parameters? Then do we still need duplicate code, or can we not write the boilerplate in ClimaParameters?
We could probably figure something out, e.g. a macro
@parameterstruct(paramdict, ComponentX)
which expands to
struct ComponentXParameters
param1::Float64 # the set of parameters tagged "Component1"
...
end
A couple of things worth noting:
RE the tag specificity. I think this is why I keep mentioning about the organization of code, for us what do we mean by component
(does this relate to a git repo?) or subcomponent
(does this relate to a module?) etc.
The tag specifics are a trade-off. broad tags (e.g. a git repo) will produce fewer, larger structs, narrow tags (e.g. a module) will produce more, smaller structs.
Modules are namespaces: i.e. a way to encapsulate things so that names are unique within a module. Packages are modules with a bit of extra metadata to resolve dependencies (and typically match 1-1 with a repo).
After some further discussion with @odunbar, we agreed on the following approach (based on option 3, and similar to what is done in OceanTurbulenceParameterEstimation.jl:
A parameters set is basically a dictionary, which contains the values and associated metadata.
Model components which use any parameters would need to access them when they are constructed (i.e. not while the model is running), and load them into their own struct (or similar object), or subcomponens: e.g. suppose we define a model such as:
struct AtmosModel
# parameters required directly by AtmosModel
gravitational_acceleration
coriolis
...
# subcomponents
thermodynamics
...
end
Then the constructor
AtmosModel(paramset; thermodynamics=MoistEquilibrium())
would do something like
# extract parameters required by model
gravitational_acceleration = get_parameter(paramset, :gravitational_acceleration, "AtmosModel")
coriolis = get_parameter(paramset, :coriolis, "AtmosModel")
# recursively extract subcomponents
thermodynamics = update(MoistEquilibrium(), paramset) # constructs a new MoistEquilibrium object with the relevant parameters
Once the components are constructed, they would refer to their own object to get the parameter value (i.e. we shouldn't be looking up parameter sets once the model is running). For example, Thermodynamics.gas_constant_air
:
would now refer to the specific MoistEquilibrium
object instead of the parameter set:
function gas_constant_air(thermo::MoistEquilibrium, q::PhasePartition{FT}) where {FT}
return thermo.R_d *
(1 + (thermo.molmass_ratio - 1) * q.tot - thermo.molmass_ratio * (q.liq + q.ice))
end
function get_parameter(paramset, name, component)
# log that the parameter was used
@info(paramset, name, component)
# get the value
return paramset.dict[name]
end
This is significant change from our current approach: unfortunately, there isn't an easy way to gradually switch over, but it is probably worth prototyping this in a single package (e.g. TurbulenceConvection or Thermodynamics?).
I love this.
Sorry I'm late to the party.
I don't mean to open a can of worms, but @kpamnany and I have already voiced concerns about the distributed approach (option 3), as it does not guarantee the "single parameter" requirement that we were given in the very early stages of developing a design solution to CliMA's parameters.
If we go down this road of distributed parameters, then it sounds like we've relaxed this requirement. The existing approach (using methods, and not types) ensures this by construction. The distributed approach, where each package / module has its own parameters via separate types, is subject to having different parameters and the responsibility of ensuring a single value is used for all parameters is pushed to the outer-most layer (i.g., tests/experiments/drivers for all other repos, i.e, the user). Here's a simple example:
atmos = AtmosModel(
thermo = ThermoParams(; grav = 1), # override for grav in Thermodynamics.jl
microphys = MicrophysicsParams(; grav = 2), # override for grav in CloudMicrophysics.jl
)
Arguably, this is something that obviously looks wrong and "won't typically happen", but the point is that it can happen. Let's consider a slightly more complex situation:
atmos = AtmosModel(
thermo = ThermoParams(), # I can use Thermo's defaults, right? ¯\_(ツ)_/¯
microphys = MicrophysicsParams(; grav = 2), # override for grav in CloudMicrophysics.jl
)
And now we have inconsistent parameters.
Now, I'm not saying that we shouldn't take the distributed approach--our original design choice was sketched out based on the given requirements at the time. This distributed approach may look fine / easy / sensible when looking at a single package, like Thermodynamics or Oceananigans. But, our existing method-based design satisfies the single parameter requirement across all packages.
My preference would be to open two issues: 1) regarding the hard-aches with our existing design and 2) regarding additional needs from our existing design. For 2), #58 looks great, and that can work with our existing design. For 1), I imagine that we will be able to alleviate hard-aches with our existing design if they are laid out. I think I recall a preference for α = param_set.α
over α = CP.α(param_set)
, but I don't see why this is problematic and I've yet to see a concrete technical reason to change our existing design.
Perhaps it would be good to start by documenting the requirements of the parameters feature?
Some distinction here may help w.r.t. to when we do and don't need consistent parameters:
All "planet" parameters should be the same across model components. This includes thermodynamic constants, planetary rotation rate, mean radius etc., and crucially, e.g., the latent heat of vaporization at the triple point (land, ocean, atmosphere, ice should use the same to ensure energy conservation for the coupled model). These should all live in some "planet" file shared by all model components.
Many other parameters can plausibly be different for different model components. For example, the Smagorinsky coefficient for an atmosphere and ocean model may well be different. We do not need to enforce consistency here.
- Many other parameters can plausibly be different for different model components. For example, the Smagorinsky coefficient for an atmosphere and ocean model may well be different. We do not need to enforce consistency here.
Wouldn't it still be safer in that case to have Smagorinsky_ocean
and Smagorinsky_atmos
coefficient, so that we keep the names unique across the codebase?
I think a good usage example was when for debugging purposes we wanted to set the freezing temperature to something very low to test the TurbulenceConvection.jl
package. This required overwriting T_freeze
for Thermodynamics.jl
, CloudMicrophysics.jl
and TurbulenceConvection.jl
at the same time. It would be nice to still be able to do this easily and in one place, without having to check on individual packages.
3. Many other parameters can plausibly be different for different model components. For example, the Smagorinsky coefficient for an atmosphere and ocean model may well be different. We do not need to enforce consistency here.
Wouldn't it still be safer in that case to have
Smagorinsky_ocean
andSmagorinsky_atmos
coefficient, so that we keep the names unique across the codebase?
Maybe. But if (one day) atmosphere and ocean use the same CFD code base, you couldn't easily do that. The broader point is: there are parameters that are not "universal" constants, and at least for experimentation, there may be situation where we want them to be different in different model components.
Sorry I just saw this thread, and thanks for voicing these concerns, @charleskawczynski ! I think to summarize, what our current setup does is remove the word "enforce" in the requirements and replace it with "make it difficult to change". As @kpamnany and @tapios note,I think this results in a softening of the need for universal parameters; however in practice, it seems to be the use case that we would rather have such common sense principles over enforcement, as it is likely we will require such flexibility down the road.
On @trontrytel and @tapios conversation. Yet another option is (if relationships are known between constants)
smagorinsky
only in the parameter file with a description "atmospheric smagorinsky" ocean=OceanComponent
you create a "derived parameter" with smagorinsky = smagorinsky + epsilon
ocean.smagorinsky
The difference here to previous suggestions is that only one constant is defined in the parameter file, and, changes to that parameter occur at the component level of the use case (and no further). You can also protect from confusion by renaming the local smagorinsky (e.g. ocean_smagorinsky
to indicate it is different). If not then you know where the parameter is defined because it is called as ocean.parameter
and so you know it is defined in the OceanComponent
constructor.
Rather than building support for "parameter sets" into the model constructors, it might be less intrusive to require individual components (eg mutable struct AtmosModel
to define a function
set!(model, parameter_set)
that mutates model components according to the contents of parameter_set
. We have prototype support for this functionality in Oceananigans
/ OceanTurbulenceParameterEstimation
for the purpose of calibrating turbulence closures, and it's useful for running a few different models without recompiling or reallocating memory for a new simulation.
The main benefit of this pattern is that it simplifies the model constructors, which seem to end up being complicated pieces of code.
Given the discussion here: https://github.com/CliMA/TurbulenceConvection.jl/pull/766#discussion_r795078598 Options 1 and 2 aren't really going to be feasible if you get into arrays of hundreds or thousands values.
As distinct from the the TOML format, it might be worth rethinking how we represent parameters internally. There are pros and cons to each but it is worth explaining them a bit. As far as I can see, there are 3 broad options:
1. A package of constants
Conceptually this is the simplest: we get rid of the notion of a "parameter set" completely, and you simply have a package with a bunch of constants:
(note: this code would be generated by the toml file)
This is very easy for the user to make use of:
The original reason we didn't use this is that well, they're constant: you can't modify them. However now with the new Preferences system, we would change the definition above to
and then the user could modify it by the
LocalPreferences.toml
file in their current experiment directoryThis would then correctly trigger recompilation when the value is modified, and the changes would only apply to the current experiment.
However there are a couple of downsides:
We might be able to work around these by defining them as
Ref
s, but then user code would look likewhich isn't particularly nice.
2. Constant functions
This is the approach we use now: a function which returns a constant value:
At the user level, this involves:
In cases where you might be doing multiple runs with different values of a given parameter, the intention was that you would create a new parameter set type with fields for the modifiable parameter values:
This is roughly what @charleskawczynski did in TurbulenceConvection.jl: https://github.com/CliMA/TurbulenceConvection.jl/blob/2a0cea8c184b5f8ffe13f9862aaf3197aa8eca56/driver/parameter_set.jl#L43
As a possible fix to the interface issue, in #57 I proposed a slight modification.
value
functiongetproperty
overloading of the parameter sets to lookup by symbol.Roughly, the (generated) module now becomes
and then the user code looks like:
After some consideration, I think something along these lines would be my personal preference.
3. A dictionary
This is at the other end of the extreme: basically CLIMAParameters just gives a dictionary of values, and it is up to the model components to extract the necessary values at construction (to avoid the cost incurred of looking up dictionaries), e.g.
This would mean that components could be run completely independent of CLIMAParameters. However it could mean that we end up duplicating a lot of code and constants and end up just pushing the problem down into individual components, e.g. Thermodynamics.jl would probably end up defining its own
ThermodynamicsParameters
object to capture all the parameters needed for thermodynamics.