Question: should an aerosol process validate its input?

jeff-cohere commented 3 years ago

Almost all parameterizations have a "region of validity" inside which they return trustworthy results. For example, MAM4's nucleation process is valid at temperatures between 230.15 K and 305.15 K, relative humidities between 0.01% and 99%, and H2SO4 "number concentrations" (I say "number densities") between 10^4 and 10^11 per cc.

What happens when a process is given input whose parameters fall outside of this region of validity? There are two philosophical approaches for handling this situation.

The input is projected to the nearest valid values within the region of validity. This is the method used by MAM4, and explains many of the myriad MINs and MAXs we see everywhere.
The input is checked before the aerosol process runs, and an exception (or error condition) is emitted if any values fall outside the region of validity.

Personally, I'm a much bigger fan of method 2 than 1. Method 2 is more in line with modern software practices, making it easier to trace pathological values to their source and not just blindly continuing with dubious alterations--especially in cases where altering the input affects mass and/or energy balance. In this philosophy, it's the host model's job to make sure the input is valid, and/or "apply fixes" if it's not.

Do others have thoughts on this? Adopting method 2 would take us away from a strict port of MAM4, but would help us understand what's actually going on instead of using a "black box" approach. I am working on a feature branch containing code that illustrates a possible implementaton of method 2 in case we're interested in going for it.

@huiwanpnnl @singhbalwinder @overfelt @pbosler

pbosler commented 3 years ago

I also strongly prefer option 2. Perhaps there's a compromise though (at the cost of CMake bloat). Water uptake, for example, has the same issue --- we could define a CMake variable (default: OFF) to allow users to adopt option 1 for the case of that specific error.

jeff-cohere commented 3 years ago

Perhaps so. On the other hand, we can allow each aerosol process to specify its own region of validity. So option 2's mechanism actually includes supporting option 1 selectively if we want to do that.

pbosler commented 3 years ago

In the .in config file: #cmakedefine WATER_UPTAKE_INPUT_ERROR_USE_DRY_RADIUS

in the water uptake diagnostic, currently:

constructor {
EKAT_ASSERT(valid_input());
}

Possible upgrade:

constructor {

#ifndef WATER_UPTAKE_INPUT_ERROR_USE_DRY_RADIUS
EKAT_REQUIRE(valid_input());
#else
if (!valid_input()) {
  result = dry_radius;
}
#endif
}

jeff-cohere commented 3 years ago

Since we're supporting settable parameters nowadays, we could also make this a run-time-configurable option for processes and diagnostic functions.

pbosler commented 3 years ago

option 2's mechanism actually includes supporting option 1 selectively if we want to do that.

I'm in favor of this idea. Our default behavior should require valid input. Then we can selectively allow looser conditions but only if the user explicitly turns them on.

jeff-cohere commented 3 years ago

Also, for those concerned that we're drifting further from MAM4: it seems to me that what MAM4 does now is a mixture of aerosol processes and host model logic. Specifically:

processes are assumed to be ordered in a specific way, which violates our assumption of process independence
inputs are altered in a crude and simplistic fashion (method 1 above) whose "validity" can only be examined for a specific set of processes under specific conditions

In the past, @huiwanpnnl has raised the possibility that we develop a "high-level" set of "clear-sky", "cloudy", and "radiative" aerosol "super-processes" that package our lower-level aerosol processes into pre-validated bundles. I think these bundles are ultimately what we should compare to the existing MAM4 model. This allows us to be more stringent about how we implement aerosol processes at the low level. Then the higher-level interfaces can implement all of MAM4's assumptions (process ordering, input alterations), just like a host model would if it used the low-level interface.

singhbalwinder commented 3 years ago

I also like this idea. Generally it is left up to the parameterization developer to decide whether to enforce a "validity range" or not. It will be nice to have an infrastructure in place where parameterization developers can use it uniformly. Option 2 gives enough flexibility and can be a great way to debug issues.

I know I am stating the obvious that even if we enforce a range of validity, there may be some combinations of the valid inputs which may be unphysical. If those cases are known, those can be treated within the parameterizations. In some MAM process, there are internal checks to ensure that various quantities are in range (otherwise it quits) but these checks are generally on process specific variables (which may not may not include raw inputs).

In the current E3SM driver, there is a routine which checks for valid inputs after some processes, so it will be nice if the host model (SCREAM) also has these sort of checks. Basically, more checks are always good.

jeff-cohere commented 3 years ago

Thanks, Balwinder. I agree completely about parameterizations handling the tricker stuff that arises from input within the region of validity. And also that more checks are better!

eagles-project / haero

Question: should an aerosol process validate its input? #230