Aerosol zonal plots - Githubissues

justin-richling commented 8 months ago

This PR will add calculations for zonal aerosol plots, and add new functionality for colorbar options.

Files changed:

adf_variable_defaults.yaml

Add more aerosol variables and colorbar options
Add constants section

adf_diag.py

Add automatic time series file generation for PMID and T if certain aerosol variables are requested
Change how derived quantities are created; add flexibility in number of constituents a variable needs

plotting_functions.py

Add compatibility for non-linear increments for color maps/ranges
Specify difference plot colorbar options from case colorbar options

regrid_and_vert_interp.py:

Fix small variable name typo

brianpm commented 8 months ago

@justin-richling and @nusbaume

Following up on an email I sent you about this... I wonder if we should provide an improved approach for derived variables. If we think bringing in numexpr as a new dependency is a reasonable approach, I think I've got a potential drop-in replacement, as follows:

provide a new "variable_default" entry called formula that would define the derived variable in terms of the derivable_from list.

Modify the vars_to_derive variable like this:

        # Loop over CAM history variables:
        list_of_commands = []
        vars_to_derive = {}
        # create copy of var list that can be modified for derivable variables
        diag_var_list = self.diag_var_list
        for var in diag_var_list:
            if var not in hist_file_var_list:
                vres = res.get(var, {})
                if ("derivable_from" in vres) and ("formula" in vres):
                    constit_list = vres["derivable_from"]
                    expression = vres["formula"]
                    for constit in constit_list:
                        if constit not in diag_var_list:
                            diag_var_list.append(constit)
                    vars_to_derive[var] = [constit_list, expression]
                    continue
                else:
                    msg = f"WARNING: {var} is not in the file {hist_files[0]}."
                    msg += " No time series will be generated."
                    print(msg)
                    continue

replace derive_variables with the following (UNTESTED):
```
import numexpr as nex
```

def derive(expression, **inputs): """Derive output using expression, with input data suppled in inputs.

expression : str
    A string describing the expression to evaluate, can contain
    numbers, variables, and basic operators. 

inputs : dict
    A dictionary with keys being the variables used in expression
    and the values being the data (either xr.DataArray or numpy arrays)

return
    The result of expression

NOTES
-----
    Example: expression = "PRECC + PRECL" 
             inputs = {"PRECC": arr1, "PRECL": arr2}
"""
return nex.evaluate(expression, inputs)

def derive_xrWrap(expression, *args, **kwargs):

# maybe this isn't kosher, but this allows the user
# to either provide named arguments 
# or a dictionary without using **dict
if (len(args) == 1) and (isinstance(args[0], dict)):
    kwargs = args[0]

template = kwargs[[*kwargs][0]] # whatever is the first input
# pop the things that are not part of expression inputs
dims = kwargs.pop("dims", (template.dims if hasattr(template, 'dims') else None))
coords = kwargs.pop("coords", (template.coords if hasattr(template, 'coords') else None))
attrs = kwargs.pop("attrs", template.attrs)
ans = derive(expression, kwargs)
return xr.DataArray(ans, dims=dims, coords=coords, attrs=attrs)

def derive_variables(self, vars_to_derive=None, ts_dir=None, overwrite=None): """ Derive variables according to expression and inputs provided in vars_to_derive dict.

Caution: this method _may still_ assume that there will be one time series file per variable

If the file for the derived variable exists, the kwarg `overwrite` determines
whether to overwrite the file (true) or exit with a warning message.
"""

for var in vars_to_derive:
    constit_list = vars_to_derive[var][0]
    expression = vars_to_derive[var][1]
    # get constituent files:
    constit_files = {}
    for c in constit_list:
        if glob.glob(os.path.join(ts_dir, f"*.{c}.*")):
            constit_files[c] = sorted(glob.glob(os.path.join(ts_dir, f"*.{c}.*")))
        else:
            ermsg = f"{c} files were not present; {var} cannot be calculated."
            ermsg += f" Please remove {var} from diag_var_list or find the relevant CAM files."
            raise FileNotFoundError(ermsg)
    # from the constit_files, get the data
    constit_data = {}
    for c in constit_files:
        if len(constit_files[c] > 1):
            constit_data[c] = xr.open_mfdataset(constit_files[c])[c]
        else:
            constit_data[c] = xr.open_dataset(constit_files[c][0])[c]
    # output file specification:
    # TODO: this might not work for multi-file cases
    derived_file = constit_files[[*constit_files][0]].replace(constit_list[0], var)
    if Path(derived_file).is_file():
        if overwrite:
            Path(derived_file).unlink()
        else:
            print(
                f"[{__name__}] Warning: {var} file was found and overwrite is False. Will use existing file."
            )
            continue
    # derive the variable using expression:
    result = derive_xrWrap(expression, **constit_data)  # defaults to copying metadata from 1st constituent
    # TODO: provide a way to send updated metadata to derive_xrWrap, maybe from additional variable_defaults info
    # Save output:
    if 'time' in result.dims:
        udim = 'time'
    else:
        udim = None
    result.to_netcdf(derived_file, unlimited_dims=udim, mode='w')

brianpm commented 8 months ago

I thought about this PR some more. I think given my previous suggestions, it should be acceptable to have derived variables that could be derived from derived variables themselves. For example, say PRECT is derived from PRECC + PRECL. Then the convective fraction of precipitation could be derived as PRECCFRAC = PRECC / PRECT. This seems fine, but requires doing the derived variables in a correct order. I believe that we could just reorder the vars_to_derive using the following code:

def get_derive_order(vres):
    """provides a valid order for deriving variables
       such that derived quantities that are dependencies
       for other quantities are earlier in the output list

        parameters
        ----------
        vres : dict
            dict of each derivable variable with values that are the dependencies (as list)

        return
        ------
        list
            The list of derived variables in a valid order for derivations to succeed
    """
    # use deque for efficient inserting/appending
    from collections import deque
    # Get the full set of values
    derivable = set(vres.keys())  # This also serves as our definition of having dependencies

    q = deque()

    while len(q) < len(derivable):  # every derivable needs to be in q
        for v in derivable:  # Scan through all derivables
            if v not in q:
                q.append(v)
                vindex = q.index(v)  # we might need to know this position
            if v in vres:
                vdepends = vres[v]
                if not isinstance(vdepends, list):
                    vdepends = [vdepends] # in case it's just a single dependency given as a str
                vdtest = [vd for vd in vdepends if vd in vres]  # list of dependencies with dependencies
                if not vdtest:
                    continue # v in q and no dependencies have dependencies
                else:
                    vdd = [vd for vd in enumerate(vdtest) if vd in q] # true if already in q
                    if any(vdd):
                        ndx = vdd.index(True)  # index of first True value
                        q.insert(vindex, vdd[ndx]) # put ahead of v in q
                        break  # leave the for-loop, to re-start the scan
            else:
                print(f"ERROR: encountered {v} which is not listed as a derived quantity")
    return list(q)  # convert deque to list

And that could be used right before calling derive_variables:

if vars_to_derive:
    var_order = get_derive_order(vars_to_derive)  # already a dict as per my other suggestion
    vars_to_derive_ordered = {k: vars_to_derive[k] for k in var_order}
    self.derive_variables(vars_to_derive=vars_to_derive_ordered, ts_dir=ts_dir[case_idx])

brianpm commented 8 months ago

@justin-richling -- Sorry for all this code, I think I got carried away! If you and/or @nusbaume would prefer to move ahead with your PR, and then I can put my suggested changes in separately, that would be fine with me. Let me know.

justin-richling commented 7 months ago

@nusbaume I believe I've addressed your and @brianpm's suggestions/changes. Let me know if anything else needs attention, thanks!

NCAR / ADF

Aerosol zonal plots #291

Files changed: