Open forsyth2 opened 5 months ago
Since the e3sm_diags
package has a thorough derivations section (https://github.com/E3SM-Project/e3sm_diags/blob/main/e3sm_diags/derivations/acme.py), we could potentially just move that out into a package that can be called by others.
https://github.com/E3SM-Project/e3sm_diags/blob/main/e3sm_diags/derivations/acme.py seems to be composed of more or less the following sections:
L19-619: Functions to convert between variables and/or units, which may be called by multiple other functions. Generally, but not always, the arguments to these functions are variables (as type cdms.TransientVariable
, which will of course be replaced in the CDAT migration effort). L2163-2550 is similar, but many of those functions make updates to the derived variables dict.
L619-2161 (the derived variables dict) is an dictionary mapping variables (as strings) to ordered dictionaries mapping variables (as strings) to functions. I'm assuming by using ordered dictionaries, the code will then go through the possible substitutions in that prescribed order.
The logic of deriving variables actually extends further into https://github.com/E3SM-Project/e3sm_diags/blob/main/e3sm_diags/e3sm_diags_vars.py check_for_derived_vars
.
This block almost makes it look like we'd need all possible base variables present in the user's file (i.e., there's no filtering on possible_vars
)
if var in derived_variables:
# Ex: {('PRECC', 'PRECL'): func, ('pr',): func1, ...}.
vars_to_func_dict = derived_variables[var]
# Ex: [('pr',), ('PRECC', 'PRECL')].
possible_vars = vars_to_func_dict.keys() # type: ignore
var_added = False
for list_of_vars in possible_vars:
if not var_added and vars_in_user_file.issuperset(list_of_vars):
# All of the variables (list_of_vars) are in the input file.
# These are needed.
vars_used.extend(list_of_vars)
var_added = True
# If none of the original vars are in the file, just keep this var.
# This means that it isn't a derived variable in E3SM.
if not var_added:
vars_used.append(var)
I feel like a recursive approach as in https://github.com/E3SM-Project/zppy/blob/main/zppy/templates/readTS.py would be the cleanest. It would be easier to follow than the derived variable dictionary. However, short of re-implementing the entire derivation code to check, I'm not sure it would fully cover everything.
def get_var(var_name: str, defined_vars: Dict[str, var]) -> var:
if var_name in defined_vars:
return defined_vars[var_name]
elif var_name == "PRECT":
pr = get_var("pr", defined_vars)
if pr:
return(qflxconvert_units(pr))
# Try second derivation method
precc = get_var("PRECC")
precl = get_var("PRECL")
if precc and precl:
return prect(precc, precl)
# Try third derivation method
...
else:
# Could not define the variable
return None
It's possible the third-party symbolic algebra package would be the cleanest solution. I suppose we could try to define the variables as symbols in SymPy and work from there, but we may have too much going on here -- names of variables, and also their values and units.
@xylar Do you know of any packages or algorithms that would handle something like this well? (This is a lower-priority item; it's just something that has come up a few times now as being potentially useful).
Or maybe option (1)/(2) below would be the better path forward?
- Have the model itself derive variables, listing derived variables along with original values in output.
- Doing the above, but rather than in the model, do it as a separate step before the rest of the post-processing workflow.
- Create a package to derive variables as-needed. E.g., if someone requests a derived variable, the e3sm_diags package and the global_time_series zppy task would both call this new package to derive it from the given data.
@forsyth2, thanks for pinging me on this. I don't have any experience with this myself. I haven't tried to allow users to define their own new products and such.
Request criteria
Issue description
Currently, variable derivations are handled on a per-package basis. For example, in the
global_time_series
task, the derivations are handled in https://github.com/E3SM-Project/zppy/blob/main/zppy/templates/readTS.py and in thee3sm_diags
package, the derivations are handled in https://github.com/E3SM-Project/e3sm_diags/blob/main/e3sm_diags/derivations/acme.py.It would make more sense for derivations to be handled uniformly. Possible options:
e3sm_diags
package and theglobal_time_series
zppy task would both call this new package to derive it from the given data.It's possible a generic package (e.g., a symbolic/computer algebra library) could accomplish (3) without much extra work from us.