CliMA / ClimateMachine.jl

Climate Machine: an Earth System Model that automatically learns from data

https://clima.github.io/ClimateMachine.jl/latest/

Other

448 stars 77 forks source link

Namelist for simplified user experience #947

Open smarras79 opened 4 years ago

smarras79 commented 4 years ago

Description

Terminology: in the following, "driver" indicates any setup file contained in experiments and not the Driver.jl file.

I'd like to go back to the open discussion from last summer and fall about the use of a namelist file that contains the main and only informations needed to execute CLIMA.

As of now, although the drivers have been greatly cleaned up and minimized in the past few months, they still require a user to be very familiar with Julia code; not only, the user should also be familiar with the guts of CLIMA if he/she really wants to be able to modify the driver without breaking the code or without removing relevant information from it.

To avoid this issue, it is standard practice in the atmospheric (WRF, CM1, Pycles, NUMA, etc.) and CFD (OpenFOAM, Nektar++, etc.) communities to use a namelist file (often simply called input file) that does not contain any code. A namelist is nothing else than only a list of parameters that the code will simply read and do what it should. These namelists can be as simple as a sequential list of parameters, or more hierarchical such as the one used by Pycles which is a JSON file.

For example, here is the one used by Pycles:

{
    "conditional_stats": {
        "classes": [
            "Spectra"
        ],
        "frequency": 600.0,
        "stats_dir": "cond_stats"
    },
    "damping": {
        "Rayleigh": {
            "gamma_r": 0.002,
            "z_d": 500.0
        },
        "scheme": "Rayleigh"
    },
    "diffusion": {
        "qt_entropy_source": false
    },
    "fields_io": {
        "diagnostic_fields": [
            "ql",
            "temperature",
            "buoyancy_frequency",
            "viscosity"
        ],
        "fields_dir": "fields",
        "frequency": 3600.0
    },
    "grid": {
        "dims": 3,
        "dx": 35.0,
        "dy": 35.0,
        "dz": 5.0,
        "gw": 5,
        "nx": 96,
        "ny": 96,
        "nz": 300
    },
    "microphysics": {
        "ccn": 100000000.0,
        "cloud_sedimentation": false,
        "phase_partitioning": "liquid_only",
        "scheme": "None_SA"
    },
    "momentum_transport": {
        "order": 7
    },

    "output": {
        "output_root": "./"
    },
    "restart": {
        "frequency": 600.0,
        "init_from": false,
        "input_path": "./",
        "output": true
    },
    "sgs": {
        "scheme": "Smagorinsky"
    },

    "thermodynamics": {
        "latentheat": "constant"
    },
    "time_stepping": {
        "cfl_limit": 0.7,
        "dt_initial": 1.0,
        "dt_max": 4.0,
        "t_max": 14400.0,
        "ts_type": 3
    },
    "visualization": {
        "frequency": 1000000.0
    }
}

Additional context

To put this in perspective: don't put yourself in the shoes of the PhD student or post-doc who works 27 hours/day, 8 days/week and could possibly figure it out, but in those of a 70yr old Caltech full professor, retired, who is bored of being home during the lockdown and decides to download and run CLIMA to write a paper on Climate change. The poor retired professor will not try to figure out how to read Julia code, but, rather, will expect a text file with some some self-explicative variable name with an assigned value or keyword that can be changed without worrying whether that is a source or a flux.

For CLIMA Developers

[ ] This feature can be added (if it cannot be, explain why in a comment below -- lack of technical expertise, not relevant to the scope of this project, too ambitious)
[ ] There is a timeline for when this feature can be implemented
[ ] The feature has been (or will be) implemented (Please link the PR)

akshaysridhar commented 4 years ago

I suppose the namelist idea implies that each experiment by accompanied by its own namelist or config file, such that the <experiment> file contains the initialization, whereas the <experiment config> file basically lists the AtmosModel components. I'll walk through the functionality that the above namelist brings in (of course the actual form varies among models) and see how that compares with CLIMA right now ..

Agreed on diagnostic variable specification, this currently doesn't exist at the experiment level. Currently these are specified in the diagnostic source code files themselves, with an intent to upgrade this and allow users to design diagnostic variable groupings.

    "conditional_stats": {
        "classes": [
            "Spectra"
        ],
        "frequency": 600.0,
        "stats_dir": "cond_stats"
    },

This exists as a source term prescription, perhaps a way to clarify this would be to include SpongeRelaxation as an explicit subcomponent of AtmosModel instead of passing this as a component of the Source tuple.

    "damping": {
        "Rayleigh": {
            "gamma_r": 0.002,
            "z_d": 500.0
        },
        "scheme": "Rayleigh"
    },

This already exists as a subcomponent of AtmosModel , and I don't see how using a namelist makes this any clearer than using SmagorinskyLilly(coefficient) or ConstantViscosityWithDivergence(coefficient) in the AtmosModel specification. Besides, CLIMA has a pretty clear distinction between an (equivalent, we don't work in entropy here) q source vs q flux term, and right now I think it makes sense to maintain that distinction.

    "diffusion": {
        "qt_entropy_source": false
    },

The variable choice component is missing from experiments, but the field type ( dump-field vs horizontal averages) is available , selectable from the Diagnostics configuration struct at the experiment level. [missing docs]

    "fields_io": {
        "diagnostic_fields": [
            "ql",
            "temperature",
            "buoyancy_frequency",
            "viscosity"
        ],
        "fields_dir": "fields",
        "frequency": 3600.0
    },

The grid specification exists at the experiment level through the config_setup() function hook.

    "grid": {
        "dims": 3,
        "dx": 35.0,
        "dy": 35.0,
        "dz": 5.0,
        "gw": 5,
        "nx": 96,
        "ny": 96,
        "nz": 300
    },

This would presumably become an AtmosModel component similar to the turbulence = component. The Phase-Partitioning is currently tied to the moisture.jl MoistureModel component.

    "microphysics": {
        "ccn": 100000000.0,
        "cloud_sedimentation": false,
        "phase_partitioning": "liquid_only",
        "scheme": "None_SA"
    },

The spatial and temporal discretisations are explicitly specified through the polynomial order N and the time stepper choice in the solver_setup level via the experiment file.

    "momentum_transport": {
        "order": 7
    },

Currently in the diagnostic configuration struct

    "output": {
        "output_root": "./"
    },

Pending PR, presumably applied through a callback of some sort.

    "restart": {
        "frequency": 600.0,
        "init_from": false,
        "input_path": "./",
        "output": true
    },

See AtmosModel turbulence= component.

    "sgs": {
        "scheme": "Smagorinsky"
    },

This is currently part of the moisture subcomponent in AtmosModel

    "thermodynamics": {
        "latentheat": "constant"
    },

    "time_stepping": {
        "cfl_limit": 0.7,
        "dt_initial": 1.0,
        "dt_max": 4.0,
        "t_max": 14400.0,
        "ts_type": 3
    },

Here, I see disconnected components (assuming the copied code is a direct representation of an existing namelist line-by-line) why are output-dir, visualization frequency and output variable lists not grouped (in contrast with CLIMA DiagnosticConfiguration) ? Same with the time-stepping component above, ts_type = 3 is not at all as clear as ode_solver = LSRK144NiegemannDiehlBusch

    "visualization": {
        "frequency": 1000000.0
    }
}

One thing not apparent from the namelist file above is the set of boundary conditions used in the problem. My interpretation of the current driver_config and solver_config is that we're trying to decouple the physics (equations, diffusion, sources, init_cond) from the configuration related material (diagnostic list, diagnostics interval, problem domain and resolution). Describing things like entropy sources and diffusion models in a namelist file, but the boundary conditions elsewhere seems inconsistent to me, but perhaps people who have used a variety of different atmospheric models / GCMs can help me see the reasoning behind such a design choice. In the end we need to make it simple enough for a new user to know what they just ran by simply downloading and running CLIMA, and how they can make basic modifications to it (namelist JSONs or unique experiment files or however we choose to do it)

I suspect a strong effort at documentation within CLIMA will fix a lot of these namelist concerns.

akshaysridhar commented 4 years ago

I think what I see mostly here is that we don't have clarity in all the options available through the mutable struct configurations and model subcomponents at the experiment level without knowing which source files to check for...

szy21 commented 4 years ago

I agree that functionally, everything enabled by a namelist could be done by Julia code. The problem, I think, is which way is easier from a model user perspective.

When it comes to the namelist/diag_table/any other user interface problem, I think it is more challenging for GCM than for LES (so, don't focus too much on pycles, but it provides an example). But let's start with LES. Take bomex.jl as an example. This file is as long as 513 lines. However, what a model user would like to change may only include a few lines (Δh, Δv, w_sub, etc.). Of course, the user could read the entire code and modify it. But is it important for him/her to read how the sponge is implemented? Also, as Simone points out, it would be nice for the user to change w_sub without worrying about whether it is a source or a flux. Another nice thing about namelist is that there are always default values, so you don't need to include all the configuration of an experiment.

It's definitely more complicated for GCM, because we can easily have thousands of choices for model configuration, which may be hard to think of right now. I came up with a very simple example and hopefully it will give you some idea. Imagine someone would like to increase the level of CO2 by a certain factor, and currently CO2 is read from a file, and there is no option to change it. One can write a namelist option called co2_factor and provide it in the namelist, even without much knowledge in Julia. Is it straightforward to do so if everything is in .jl?

I hope we will keep discussing as we learn more when developing CliMA. I assume this is something easy to change in the future?

akshaysridhar commented 4 years ago

Fair point @szy21 - I think listing the defaults used in CLIMA clearly might make this distinction between individual experiments clearer from a user perspective.

0) Comments and references ~ 70 lines 1) Boilerplate / Module loads ~20 lines BOMEX Specific 2) Physics - e.g. sources/sinks ~180 lines 3) Physics - initial condition ~ 100 lines 4) Physics - boundary condition ~20 lines Generic Configs 5) Configuration - Domain size, Diagnostic Frequency, ~60 lines

A lot of the physics configuration choices collapse into the model specification; right now, for the specific case you suggest, AtmosModel does carry some default values, and a user would need to edit values within the config_bomex() function to change things like w_sub and resolution and those things, but maybe there are ways to simplify this even more. Definitely we need top-level pointers/docs on what function hooks in the experiment a user can expect to edit to achieve minor changes to experiments.

Also the formatter blows up the line-count a bit by assigning function args on newlines :)

szy21 commented 4 years ago

I think what I see mostly here is that we don't have clarity in all the options available through the mutable struct configurations and model subcomponents at the experiment level without knowing which source files to check for...

Agreed. The namelist is not to replace the documentation/user guide. I think we should think about with good documentation, whether namelist or experiment.jl is better.

smarras79 commented 4 years ago

I suppose the namelist idea implies that each experiment by accompanied by its own namelist or config file, such that the <experiment> file contains the initialization, whereas the <experiment config> file basically lists the AtmosModel components. I'll walk through the functionality that the above namelist brings in (of course the actual form varies among models) and see how that compares with CLIMA right now ..

Agreed on diagnostic variable specification, this currently doesn't exist at the experiment level. Currently these are specified in the diagnostic source code files themselves, with an intent to upgrade this and allow users to design diagnostic variable groupings.
    "conditional_stats": {
        "classes": [
            "Spectra"
        ],
        "frequency": 600.0,
        "stats_dir": "cond_stats"
    },
This exists as a source term prescription, perhaps a way to clarify this would be to include SpongeRelaxation as an explicit subcomponent of AtmosModel instead of passing this as a component of the Source tuple.
    "damping": {
        "Rayleigh": {
            "gamma_r": 0.002,
            "z_d": 500.0
        },
        "scheme": "Rayleigh"
    },
This already exists as a subcomponent of AtmosModel , and I don't see how using a namelist makes this any clearer than using SmagorinskyLilly(coefficient) or ConstantViscosityWithDivergence(coefficient) in the AtmosModel specification. Besides, CLIMA has a pretty clear distinction between an (equivalent, we don't work in entropy here) q source vs q flux term, and right now I think it makes sense to maintain that distinction.
    "diffusion": {
        "qt_entropy_source": false
    },
The variable choice component is missing from experiments, but the field type ( dump-field vs horizontal averages) is available , selectable from the Diagnostics configuration struct at the experiment level. [missing docs]
    "fields_io": {
        "diagnostic_fields": [
            "ql",
            "temperature",
            "buoyancy_frequency",
            "viscosity"
        ],
        "fields_dir": "fields",
        "frequency": 3600.0
    },
The grid specification exists at the experiment level through the config_setup() function hook.
    "grid": {
        "dims": 3,
        "dx": 35.0,
        "dy": 35.0,
        "dz": 5.0,
        "gw": 5,
        "nx": 96,
        "ny": 96,
        "nz": 300
    },
This would presumably become an AtmosModel component similar to the turbulence = component. The Phase-Partitioning is currently tied to the moisture.jl MoistureModel component.
    "microphysics": {
        "ccn": 100000000.0,
        "cloud_sedimentation": false,
        "phase_partitioning": "liquid_only",
        "scheme": "None_SA"
    },
The spatial and temporal discretisations are explicitly specified through the polynomial order N and the time stepper choice in the solver_setup level via the experiment file.
    "momentum_transport": {
        "order": 7
    },
Currently in the diagnostic configuration struct
    "output": {
        "output_root": "./"
    },
Pending PR, presumably applied through a callback of some sort.
    "restart": {
        "frequency": 600.0,
        "init_from": false,
        "input_path": "./",
        "output": true
    },
See AtmosModel turbulence= component.
    "sgs": {
        "scheme": "Smagorinsky"
    },
This is currently part of the moisture subcomponent in AtmosModel
    "thermodynamics": {
        "latentheat": "constant"
    },
    "time_stepping": {
        "cfl_limit": 0.7,
        "dt_initial": 1.0,
        "dt_max": 4.0,
        "t_max": 14400.0,
        "ts_type": 3
    },
Here, I see disconnected components (assuming the copied code is a direct representation of an existing namelist line-by-line) why are output-dir, visualization frequency and output variable lists not grouped (in contrast with CLIMA DiagnosticConfiguration) ? Same with the time-stepping component above, ts_type = 3 is not at all as clear as ode_solver = LSRK144NiegemannDiehlBusch
    "visualization": {
        "frequency": 1000000.0
    }
}
One thing not apparent from the namelist file above is the set of boundary conditions used in the problem. My interpretation of the current driver_config and solver_config is that we're trying to decouple the physics (equations, diffusion, sources, init_cond) from the configuration related material (diagnostic list, diagnostics interval, problem domain and resolution). Describing things like entropy sources and diffusion models in a namelist file, but the boundary conditions elsewhere seems inconsistent to me, but perhaps people who have used a variety of different atmospheric models / GCMs can help me see the reasoning behind such a design choice. In the end we need to make it simple enough for a new user to know what they just ran by simply downloading and running CLIMA, and how they can make basic modifications to it (namelist JSONs or unique experiment files or however we choose to do it)

I suspect a strong effort at documentation within CLIMA will fix a lot of these namelist concerns.

Hi @akshaysridhar thanks for you input. Consider that this is a distilled example of a Pycles namelist meant to provide an idea of what the atmospheric community means by namelist. What the namelist contains is puerly up to us.

Documentation is a necessary component of a large code, but it's not sufficient to allow a user to successfully modify an experiment. If documentation contains code explanation which hence requires knowledge about a specific programing language, by experience with users, I noticed that the new user tends to be discouraged rather than encouraged in pushing forward.

smarras79 commented 4 years ago

I suppose the namelist idea implies that each experiment by accompanied by its own namelist or config file, such that the <experiment> file contains the initialization, whereas the <experiment config> file basically lists the AtmosModel components. I'll walk through the functionality that the above namelist brings in (of course the actual form varies among models) and see how that compares with CLIMA right now ..

Agreed on diagnostic variable specification, this currently doesn't exist at the experiment level. Currently these are specified in the diagnostic source code files themselves, with an intent to upgrade this and allow users to design diagnostic variable groupings.
    "conditional_stats": {
        "classes": [
            "Spectra"
        ],
        "frequency": 600.0,
        "stats_dir": "cond_stats"
    },
This exists as a source term prescription, perhaps a way to clarify this would be to include SpongeRelaxation as an explicit subcomponent of AtmosModel instead of passing this as a component of the Source tuple.
    "damping": {
        "Rayleigh": {
            "gamma_r": 0.002,
            "z_d": 500.0
        },
        "scheme": "Rayleigh"
    },
This already exists as a subcomponent of AtmosModel , and I don't see how using a namelist makes this any clearer than using SmagorinskyLilly(coefficient) or ConstantViscosityWithDivergence(coefficient) in the AtmosModel specification. Besides, CLIMA has a pretty clear distinction between an (equivalent, we don't work in entropy here) q source vs q flux term, and right now I think it makes sense to maintain that distinction.

The problem is not what keyword is being used but to expect a user to know how to program in some programming language when all he needs to know is what SGS model he wants to use.

    "diffusion": {
        "qt_entropy_source": false
    },
The variable choice component is missing from experiments, but the field type ( dump-field vs horizontal averages) is available , selectable from the Diagnostics configuration struct at the experiment level. [missing docs]
    "fields_io": {
        "diagnostic_fields": [
            "ql",
            "temperature",
            "buoyancy_frequency",
            "viscosity"
        ],
        "fields_dir": "fields",
        "frequency": 3600.0
    },
The grid specification exists at the experiment level through the config_setup() function hook.
    "grid": {
        "dims": 3,
        "dx": 35.0,
        "dy": 35.0,
        "dz": 5.0,
        "gw": 5,
        "nx": 96,
        "ny": 96,
        "nz": 300
    },
This would presumably become an AtmosModel component similar to the turbulence = component. The Phase-Partitioning is currently tied to the moisture.jl MoistureModel component.
    "microphysics": {
        "ccn": 100000000.0,
        "cloud_sedimentation": false,
        "phase_partitioning": "liquid_only",
        "scheme": "None_SA"
    },
The spatial and temporal discretisations are explicitly specified through the polynomial order N and the time stepper choice in the solver_setup level via the experiment file.
    "momentum_transport": {
        "order": 7
    },
Currently in the diagnostic configuration struct
    "output": {
        "output_root": "./"
    },
Pending PR, presumably applied through a callback of some sort.
    "restart": {
        "frequency": 600.0,
        "init_from": false,
        "input_path": "./",
        "output": true
    },
See AtmosModel turbulence= component.
    "sgs": {
        "scheme": "Smagorinsky"
    },
This is currently part of the moisture subcomponent in AtmosModel
    "thermodynamics": {
        "latentheat": "constant"
    },
    "time_stepping": {
        "cfl_limit": 0.7,
        "dt_initial": 1.0,
        "dt_max": 4.0,
        "t_max": 14400.0,
        "ts_type": 3
    },
Here, I see disconnected components (assuming the copied code is a direct representation of an existing namelist line-by-line) why are output-dir, visualization frequency and output variable lists not grouped (in contrast with CLIMA DiagnosticConfiguration) ? Same with the time-stepping component above, ts_type = 3 is not at all as clear as ode_solver = LSRK144NiegemannDiehlBusch
    "visualization": {
        "frequency": 1000000.0
    }
}
One thing not apparent from the namelist file above is the set of boundary conditions used in the problem. My interpretation of the current driver_config and solver_config is that we're trying to decouple the physics (equations, diffusion, sources, init_cond) from the configuration related material (diagnostic list, diagnostics interval, problem domain and resolution). Describing things like entropy sources and diffusion models in a namelist file, but the boundary conditions elsewhere seems inconsistent to me, but perhaps people who have used a variety of different atmospheric models / GCMs can help me see the reasoning behind such a design choice. In the end we need to make it simple enough for a new user to know what they just ran by simply downloading and running CLIMA, and how they can make basic modifications to it (namelist JSONs or unique experiment files or however we choose to do it)

I suspect a strong effort at documentation within CLIMA will fix a lot of these namelist concerns.

claresinger commented 4 years ago

I think the real advantage of a namelist rather than directly having the user modify experiment.jl is just to simplify everything down to barebones. In the end when a user wants to run a LES experiment they can make a number of choices for the domain size or turbulence model, for example. The documentation is necessary because it will specify what is a valid choice. The namelist is a simple, plaintext file that specifies these choices. Yes, there will be some redundancy because that file will in large part just be read into experiment.jl but it makes the users experience cleaner.

Other advantages are:

The namelist can be saved with the output so you know exactly which settings you used for each run.
We eliminate the need for command line options. Currently there are only a few, but having more than a few is very cumbersome.

szy21 commented 4 years ago

CliMA developers (@charleskawczynski ?): Is it urgent to make a decision now? Would it be easy to change this in the future?

akshaysridhar commented 4 years ago

I think the real advantage of a namelist rather than directly having the user modify experiment.jl is just to simplify everything down to barebones. In the end when a user wants to run a LES experiment they can make a number of choices for the domain size or turbulence model, for example. The documentation is necessary because it will specify what is a valid choice. The namelist is a simple, plaintext file that specifies these choices. Yes, there will be some redundancy because that file will in large part just be read into experiment.jl but it makes the users experience cleaner.

Other advantages are:

The namelist can be saved with the output so you know exactly which settings you used for each run.

We eliminate the need for command line options. Currently there are only a few, but having more than a few is very cumbersome.

This is fair; I think, based on current CLIMA functionality, the key ideas of the namelist discussed here amount to allowing simpler access to the config_<problem>() function within the current experiment structure, and decoupling the source setup, which is the bulk of the code for an experiment (e.g. CLIMA dycoms / bomex / RTB); compare with a single file that contains tendencies/forcings/components for all experiments (using PYCLES as an example), I guess the point of me going through the namelist file by line was to ensure that all the expected functionality is currently included in CLIMA (even if it's through Julia constructors vs string arguments), i.e. providing a cleaner interface with https://docs.julialang.org/en/v1/manual/constructors/ to users of other Atmos Models ; @kpamnany may have more input on this

akshaysridhar commented 4 years ago

Other advantages are:

The namelist can be saved with the output so you know exactly which settings you used for each run.

I added a function to do this with the current AtmosModel configuration, print_model_info() such that it would dump this configuration (i.e. the main elements of config_problem()) with each simulation output file, but it appears to have been accidentally removed at some point :( . This can trivially be added back in. (@kpamnany had a cleaner suggestion for the info block handling)

smarras79 commented 4 years ago

I think that the major advantage is to get users to use clima. If they see that they need to learn Julia to modify an experiment, most of potential users won't even try.

charleskawczynski commented 4 years ago

CliMA developers (@charleskawczynski ?): Is it urgent to make a decision now? Would it be easy to change this in the future?

I think constructing a namelist requires a relatively stable top interface and, since our top interface is changing rapidly, I don't think it's practical to create one yet. Right now, a namelist will probably break on every other PR or, worse yet, will result in inconsistent input values. This will lead to confusion, opened issues based on misuse of the codebase, and attention drawn from other important tasks that need to be addressed.

We will definitely make a namelist, but I think it makes sense to do so once our top-level interface is a bit more stable because it will be much easier. We cannot afford to compile all of CLIMA on every run, which means we need to be able to run the code given some sort of input script (a namelist).

Fundamentally speaking, software is built in layers and, we're just not done with our current top layer (IMO) to put the namelist layer on top of that.

kpamnany commented 4 years ago

What Charlie said. It will be easy enough to add this; simply a question of when.

glwagner commented 4 years ago

I think this a really important discussion to have. I want to point out that people can and should explore patterns and possibilities for "translating ideas to code" in their personal scripts and in the course of their personal research, without submitting PRs to CLIMA (yet). We can and should experiment with patterns that make our lives easier, and then gradually build them in as core functionality as our package becomes more mature.

I'd like to point out a few patterns that are worth exploring, some of which may satisfy immediate needs for easier experiment specification.

Breaking up long scripts

One easy way to allow oneself to easily change parameters inside a long script is to split the script into a few parts:

initial_temperature = 273.0 # Kelvin
heat_flux = 10.0 # Watts / meter^2

include("script_that_runs_experiment.jl")

To make this more "name-list like", all that's needed is to define a function that takes keyword arguments in, and then runs the experiment. This enables a script that looks like

include("script_defining_powerful_functionality.jl")

parameters = Dict(
    initial_temperature => 273.0, # Kelvin
    heat_flux => 10.0 # Watts / meters^2
)

run_big_experiment(; parameters...)

Thus, dictionaries and splatting enable behavior that is essentially identical to namelists in other languages (as was mentioned above).

Writing stand-alone packages

A step further than simply breaking functionality into scripts is to write packages dedicated to certain purposes, like, for example, large eddy simulation of scenarios observed by field campaigns. For example, we have a few such packages being developed around Oceananigans. One example is a package I started last week for a paper I'm currently writing (this is a work in progress; I made it public just for this issue so view things with a grain of salt):

https://github.com/glwagner/WaveTransmittedTurbulence

This is a good exercise for a "power user": someone who is not a core developer, but who nevertheless is adept at writing scripts and wants to make other people's lives easier.

In this package, there are some common tools (output specification, functionality specific to surface waves, which is the subject of the paper, minor, application-specific modification to the underlying LES model) that are in /src/. Then, there are scripts outside of src that set up and run Oceananigans.jl simulations. The functionality in src means that the scripts can be set up with a small(er) number of functions than would otherwise be necessary, which makes them shorter and easier to read.

Some of this functionality may eventually make its way into core Oceananigans.jl (but certainly not all). Even if some of it is generic, I think its nice to develop and test things out in a simple clean environment like the above package first before committing to submitting a full-fledged PR with a suite of tests as would be necessary for inclusion in core Oceananigans.jl or CLIMA.jl. I think the same logic can apply to scripting patterns and functionality for CLIMA experiments. Using a package, or more generally an environment, means that your code is easily reproducible via a short sequence of commands by anyone with computing resources.

Specifying parameters on the command line

In addition to the core functionality in WaveTransmittedTurbulence/src, I've implemented the specification of a small number of parameters of the various scenarios that I consider on the command line using ArgParse.jl. This makes my life easier, and also makes reproducing results a little easier (see the README). For example, one experiment is run via

julia --project simulations/run_free_convection.jl --buoyancy_flux 1e-9 --Nh 256 --Nz 256

Because my script is written to run on a GPU if one is available, but on a CPU if one is not, one can specify a low resolution run with this command and test functionality on the CPU. This is useful for developing the script.

I hope this helps and I look forward to continuing this conversation. I think we are in an exciting time in which old methods of "experiment specification" to compiled codes are being replaced, and where the distinction between "source" and "user scripts" is becoming blurred. I think that CLIMA has the opportunity to contribute not only to more accurate climate and fluids simulations, but to the development new paradigms for the use of scientific software that will ultimately make scientists much more productive. We will have low-level building blocks, by which users have great power to implement an idea, and high-level wrappers with highly-constrained yet simple interfaces for, say, setting a single parameter and performing a run. We should all experiment with different solutions for user-interaction at each level in the "usage hierarchy" as we use CLIMA for our own research.