Object-based interface to sounding diagnostics

What should we add?

I wasn't sure if the overloading proposed below was Pythonic, but I see xarray does it frequently (e.g. date_range).

Here's my specification:

All existing code works as is (i.e., no tests break)
Code like the following does not produce errors, but generates the results described:

# Set up code omitted

results = mpcalc.parcel_profile(p, T, Td)
# Equivalent to:
results = mpcalc.parcel_profile(p, T[0], Td[0])

results = mpcalc.parcel_profile(p, T, Td, with_lcl=True)
# Equivalent to:
results = mpcalc.parcel_profile_with_lcl(p, T, Td)

results = mpcalc.parcel_profile(p, T, Td, with_lcl=True, as_dataset=True)
# Equivalent to:
results = mpcalc.parcel_profile_with_lcl_as_dataset(p, T, Td)

results = mpcalc.parcel_profile(p, T, Td, as_dataset=True)
# No equivalent, but akin to previous example with LCL omitted

An additional optional argument start_index is allowed. It defaults to 0, but allows the user to lift a parcel not at the first index. ~~(It is an error to specify anything other than 0 if the input temperature and dewpoint profiles are not arrays as in the current implementation on parcel_profile.)~~ Not an error, but means to generate the profile with the given temperature and dewpoint, but starting at pressure[start_index]. The general rule is the returned array(s) contain(s) np.nans at the levels below the starting point.
An additional optional argument start_pressure is allowed. It defaults to pressure[0], but allows the user to lift the parcel at pressure start_pressure. (The showalter_index implements a special case of this functionality.) An error is raised if start_pressure is not in the interval [min(pressure), max(pressure)]. It is an error to specify both start_pressure and start_index.
(Bonus) The restriction that the array(s) are in order of decreasing order of pressure is relaxed to a restriction that the array(s) are monotonic with respect to pressure. If the inputs are in increasing order, the default start_index is -1, and the default start_pressure is pressure[-1].

Reference

No response

Not having the boolean (True/False) arguments you have above has been an intentional design decision. Boolean arguments are often considered an anti-pattern or code smell. The short of it is that functions should really only do one thing, which keeps them easy to reason about, test, and hard to use incorrectly. Instead of having one massive function that includes every possible behavior the user might want, we make small tools that are easier to compose together, or appropriately named functions that compose based on each other. Every branch within a function makes it more complex and harder to reason about. Having functions change return type based on a boolean is especially frowned-upon.

As an example, the reason why your last example isn't needed is because you don't need a dataset when you're not interpolating to the LCL level. The user already has the coordinate values at all the points and is only getting an array of temperatures representing the profile.

What is the use case for a user passing in pressures they don't want a parcel profile for? We don't calculate the parcel profile "down" from the starting point. Is passing in start_index really that much better than:
```
parcel_profile(pressure[3:], T[3], Td[3])
```
?
start_pressure would mean the starting value is outside the values in pressure, which means that the output from parcel_profile no longer represents a mathematical function mapping pressure to temperature i.e. (T = f(pressure)) You can no longer plot the profile with plot(pressure, profile_temperature).
That would be nice to have. We've resisted because it's not trivial to do in all cases unless you sort, at which point all users have to pay the penalty for that on every call. In the vein of composable functions, I'm wondering if it'd be better to make a ensure_ordered_profile() tool that users could use if they're not sure, but power users can avoid the speed penalty.

The vibe I'm getting here is that you wish parcel_profile_with_lcl were called parcel_profile, and that we didn't have anything that had the current parcel_profile behavior. 😉

Not having the boolean (True/False) arguments you have above has been an intentional design decision. Boolean arguments are often considered an anti-pattern or code smell. The short of it is that functions should really only do one thing, which keeps them easy to reason about, test, and hard to use incorrectly. Instead of having one massive function that includes every possible behavior the user might want, we make small tools that are easier to compose together, or appropriately named functions that compose based on each other. Every branch within a function makes it more complex and harder to reason about. Having functions change return type based on a boolean is especially frowned-upon.

Yikes! Maybe this explains my love/hate relationship with xarray :laughing: . I do notice these considerations are consistent with strings having both index and rindex methods (and many other .../r... method pairs). On the other hand, the sort method of lists does have a reverse boolean argument, but at least the return type is a list regardless.

Speaking of reversing things (and going out of order)...

That would be nice to have. We've resisted because it's not trivial to do in all cases unless you sort, at which point all users have to pay the penalty for that on every call. In the vein of composable functions, I'm wondering if it'd be better to make a ensure_ordered_profile() tool that users could use if they're not sure, but power users can avoid the speed penalty.

I think that would work, or maybe another way to avoid the speed penalty would be, gasp, a boolean argument (increasing?) that allows the user to inform/guarantee to the various parcel functions that the data is in increasing order of pressure. This is the order in which the pressures appear when obtained from siphon's TDSCatalog, and it would be nice to work directly with the output of other Unidata tools. It isn't too hard to have a line like:

nam = nam.reindex(isobaric2=nam.isobaric2[::-1])

but sometimes it is isobaric3 or whatever and I'm not sure how or if this can be done with the MetPy accessors.

What is the use case for a user passing in pressures they don't want a parcel profile for? We don't calculate the parcel profile "down" from the starting point. Is passing in start_index really that much better than:

parcel_profile(pressure[3:], T[3], Td[3])

?

This is really just a simpler example of 4 and only avoids the repetition of typing 3 (or whatever) multiple times.

start_pressure would mean the starting value is outside the values in pressure, which means that the output from parcel_profile no longer represents a mathematical function mapping pressure to temperature i.e. (T = f(pressure)) You can no longer plot the profile with plot(pressure, profile_temperature).

I had a bit of a eureka moment, as this comment clarified that I was thinking of the parcel profile as an object (i.e., class) in its own right; that is, the (infinite) locus of points consisting of a particular dry adiabat below the LCL and a particular moist adiabat above the LCL (and yes, of all the points in the locus, the LCL is probably the most important). But you are right that the MetPy functions are indeed functions. However, I think you can separate the process into two parts (conceptually at least): Defining the parcel path, and realizing that parcel path as a discrete set of points at various pressures. In other words, the existing parcel_profile could be thought of / implemented as a method of a ParcelProfile class. If the only purpose of computing the parcel profile is to immediately compute something like CAPE rather than plot the parcel trajectory, the output of parcel_profile no longer needs to "conserve" the number of points in the vertical as it does now.

I'm not sure how useful implementing a ParcelProfile class would be; I'd have to experiment some, and I'm already shuddering at the performance (or lack thereof) such an implementation would entail. But it would be a way to decouple the definition of a parcel profile from any particular discrete realization of that profile.

I'm coming around to thinking an object-based approach may allow for handling #461 and #1735 in a different way than #2437.

My proposed interface is exemplified below in some theoretical code:

from metpy.profile import EnvrionmentProfile, ParcelProfile

# Obtain data (here assumed to be 1D arrays of same size pSound, TSound, and TdSound)
environ = EnvironmentProfile(pSound, TSound, dewpoint=TdSound)
# The 'constructor' also has rh, specific_hum, and mix_ratio arguments.
#No more than one moisture variable is supplied. 
#If no moisture variable is supplied, certain EnvironmentProfile methods will fail.

# Now you can use methods like:
kindex = environ.k_index()
hwbz = environ.height_of_wet_bulb_zero()
freeze = environ.freezing_level()
environ.plot()  # Quick-and-dirty skew-T, analogous to xarray plot methods
# etc.

# Now a parcel can be created as follows:
parcel = ParcelProfile(environ, type='sb')  # type='ml' and type='mu' also available

# Various methods exist...
lclp, lclT = parcel.lcl()
lfcp, lfcP = parcel.lfc() # By default uses virtual temperature correction
sbcape = parcel.cape() # By default uses virtual temperature correction
p_parcel = parcel.pressure()  # Include LCL
p_parcel2 = parcel.pressure_no_lcl()
T_parcel = parcel.temperature() # By default returns temperature of parcel at input levels + LCL
# This version returns the parcel profile at the requested pressures
T_parcel = parcel.temperature(p=range(100, 1051, 10)*units('hPa'))
Tv_parcel = parcel.virtual_temperature()
# etc.

I think that approach is pretty much how Sharppy is implemented internally? IIRC, that's had its own positives and negatives.

What I don't like about the above is: why are sounding calculations different from the entire rest of the calculation module in being an OO interface rather than the functions that power everything else?

Just spit-balling here, but what does that offer over:

import metpy.calc as mpcalc

# Make sure we have an xarray dataset that conforms to proper ordering, duplicate levels, etc.
environ = mpcalc.preprocess_sounding(pSound, TSound, dewpoint=TdSound)

# Now you can use methods like:
kindex = mpcalc.k_index(environ.T, environ.Td)
hwbz = mpcalc.height_of_wet_bulb_zero(environ.T, environ.Td)
freeze = mpcalc.freezing_level(environ.T)

parcel_opts = mpcalc.ParcelProfileOptions(type='sb', virt_temp=True)

lclp, lclT = mpcalc.lcl(environ.T, environ.Td)
lfcp, lfcP = mpcalc.lfc(environ.T, environ.Td, parcel_opts)
sbcape = mpcalc.cape(environ.T, environ.Td, parcel_opts)
Tv_parcel = mpcalc.virtual_temperature(environ.T, environ.Td)

I'm not sure why having environ or parcel_opts on the left side of the dot is an improvement; maybe a bit of succinctness?. The lesson I've finally learned over doing this long enough isn't that objects aren't somehow magically better. I'm not huge fan of "every new thing we wants means adding a new method"-type designs; I've watched this grow completely unwieldy for e.g. Matplotlib's Axes class.

IMO, good object-oriented design isn't that there's an object that maps to every real-world concept. It's that we create an object when we need to encapsulate access to a group of state and we provide methods that act on this batch of state (sometimes maintaining one or more invariants that need to be enforced).

I like functions because they're easier to compose. It's not clear to me how the interface above works when we want to calculate CAPE on a grid, or when we want to mix with...some other input-data type. Objects don't compose unless you create very careful interfaces.

That's not to say we can't make the sounding functions better. I'd just REALLY like to try to stick to arrays in -> arrays out. OR maybe if we're making a new API, Dataset In -> Dataset OUT. But I am willing to be proven wrong, though that will take concrete, well-thought out proposals or test implementations so that we can compare/contrast.

I wouldn't get rid of the functions; I see an OO interface being a higher-level wrapper over the existing functions, as the declarative interface is a wrapper over Matplotlib. I do like the structure you have proposed above with the preprocess_sounding function and ParcelProfileOptions object, which would go a long way I think.

An OO interface would still be helpful in my mind for the more complicated diagnostics, like the significant tornado parameter. That parameter is fundamentally a function of the temperature, dewpoint, and wind profiles along with the vertical coordinates, but to write it as such, I think the code would look something like:

stp = mpcalc.significant_tornado(mpcalc.cape(environ.T, environ.Td, parcel_opts),
     mpcalc.lcl(environ.T, environ.Td),
     mpcalc.storm_relative_helicity(environ.z, environ.u, environ.v, depth=1*units.km),
     mpcalc.bulk_shear(environ.p,  # Why?
          environ.u, environ.v, environ.z, depth=6*units.km))

Now if I cared about those intermediate results like the CAPE, I could calculate those first rather than combine everything into one giant function call, but if all I want is the STP, wouldn't it be nice to write:

stp = parcel.significant_tornado()

and be done?

And wouldn't it be even nicer to write:

stp = parcel.significant_tornado()
sbcape = parcel.cape()

and only have the CAPE calculated one time because that intermediate result was cached?

So yes, succinctness (and inoculation against inefficient code) would be the primary benefits.

But I agree a prototype with actual working code would be essential. It may be that the pain outweighs the gain.

I forgot to address the question of what makes sounding calculations fundamentally different from everything else. I think the answer to that is that sounding calculations are reductions. Just about every other MetPy function preserves dimensions, but for sounding diagnostics, the vertical dimension is eliminated. Why exactly that matters I can't say for sure, but it is the key difference in any case.

Unidata / MetPy

Object-based interface to sounding diagnostics #2925

What should we add?

Reference