Figure out what to do with volume and surface area

xaviernogueira commented 1 year ago

In the previous version of the modules volume and surface area were analogous to state variables.

However, nowhere in the TSM module specifically are they altered.

I can change them to be static variables, but state might be a better fit as other models this can be linked with could alter their values.

For now there are mock functions in tsm/state_variables.py that just return their previous value. This is probably fine, and maybe we can move them to the proper processes file?

aufdenkampe commented 1 year ago

@xaviernogueira, thanks for creating this issue.

Cell volume and surface area need to available to all modules, so should probably be moved to shared. They will often change with time, but that will always be done by the external water transport model. However, we need to keep track of those changes for every tilmestep, to convert concentrations to masses and fluxes/loads. Therefore, they should be considered state variables.

I will use this issue as an opportunity to again suggest that we migrate away from from the current use='static' or 'state' or 'dynamic'.

I much prefer the variable intent as defined by xarray-simlab, where the three options are:

intent='in' if needed as input to a process, but not modified (read-only)
intent='inout' if updated by a process (read & write)
intent='out' if computed by a process (write only)

This is much less ambiguous, and more clearly aligns with modeling language describing inputs and outputs (and variables that are both).

As more background, the term "state variable" means something very specific to chemists. Remember that these are water quality modules, so I would really like to move back toward using that term in a way that doesn't conflict with how it is used by chemists. See:

xaviernogueira commented 1 year ago

Hm the issue with the in, in/out and out is that we have no "out" variables since our dynamic variables by default are not output (although they can be optionally). So we have "in" (currently static), and "in/out" (currently state).

Maybe we can do "in", "in/out", and "intermediate"?

I see why we may move away from "state", but I don't think "in/out" is descriptive enough to capture that our "state" variables is what the module is modelling. They are the main result we are tracking. A smaller side issue is that we have functions like "get_state_variable_names()". And "get_in_out_variable_names()" is a bit confusing without being able to use a "/".

Additionally, our "in" variables are only input once upon init and do not contain a time dimension, which is behavior that is not captured by the proposed naming. It's kind of confusing to call them "in", but you can't actual input new values for them for a given time step calculation. Maybe "boundary" although I know that has other meanings.

Simlab's naming is purposely not very descriptive as they need to be general for all modelling needs. In our use case we have additional context for what these variables are in relation to the purpose of the modules (ie state) as well as the behavior of the code (ie intermediate, since they are not output).

Thoughts? I am bad at naming and can't really think of a good replacement for "state" that captures that it is the main result.

aufdenkampe commented 1 year ago

@xaviernogueira, let's discuss this all a bit more.

I indeed think that we have in, inout, and out variables, but that they aren't simple translations from your current static, dynamic, and state.

For example, while some input (in) variables are a single scalar (such as the Arrhenius coefficients) , whereas others (such as volume and surface area) are full time-changing grid of values that you get from an external program. The key point is that these values are not changed by anything internal to these models. A chemist would argue that volume and surface area are state variables

Other variables, like Temperature or a nutrient concentration, are read in from an external source, modified by these models, and sent back to the external transport model. These state variables would be inout.

Many other "state" variables, however, will never be read from an external program, but are completely calculated within these modules. These variables are out whether we write them to a file or not (because of course we could write them to a file). Calling these intermediate values doesn't make sense, because many of them could be important for post-modeling interpretation, even though we aren't going to send them back to the transporter model for advection/diffusion calculations.

Let's talk.

EcohydrologyTeam / ClearWater-modules

Figure out what to do with volume and surface area #50