Open susburrows opened 9 years ago
@painter1 commented: "the current code uses area-weighted averaging for computing diagnostic plots. The only unweighted average is the mean you see printed at the top of the plot."
We believe that for the vertical zonal mean plots, the plotted values are also always being calculated using area-weighted averaging. If that is correct, this will need to be updated also. The plot calculations should also use mass-weighted averaging in some cases (the same cases where it is used in calculating the mean and RMSE).
Following are some excerpts from an email discussion where we identified a strategy for determining which fields should be area-weighted vs. mass-weighted.
Here¹s my advice about how to identify the appropriate weighting to use for different variables in the uvcmetrics code, which I have also just discussed with Phil:
The default should still be area-weighting, but mass-weighting should be used if the variable¹s units match one of the above list. There might also be others that I am not thinking of. Would it be possible to write a small script that categorizes the variables that we are currently using in any of the sets/collections according to these rules and writes out the variable name, its units, and whether it was assigned to be area-weighted or mass-weighted? Then we could check whether the categorization appears to make sense, or if there are any cases we have overlooked.
Could you let us know whether the above is doable in the near future (or if you have any questions of comments on this)? Getting the numbers on the plots correct is a very high priority for us.
Below are the special cases that are currently coded in reconcile_units(). One class of cases deals with percents/fractions, unit less variables, etc.. In general I believe these will all be area-weighted (unless they are actually [kg]/[kg] or something like that!). The conversion of “kg/m2/s” to “mm/day” relates to precipitation and is not a true unit conversion. Precipitation will also be area-weighted. Another set of cases deals with different spellings of units (e.g. mb vs. mbar); the cases listed here will also be area-weighted. So, I am not seeing any "special cases” (that udunits can’t understand) that will need to be mass-weighted.
I’d suggest we just try using udunits to check for equivalence with one of the categories of units I originally listed and have the subroutine print out some information about what it does to a log file so we can determine whether it is working…
if mv1.units=='kg/m2/s' and mv2.units=='mm/day':
preferred_units="mm/day"
mv1.units="mm/s" # if 1 kg = 10^6 mm^3 as for water
if mv1.units=='mb':
mv1.units = 'mbar' # udunits uses mb for something else
if mv2.units=='mb':
mv2.units = 'mbar' # udunits uses mb for something else
if mv1.units=='mb/day':
mv1.units = 'mbar/day' # udunits uses mb for something else
if mv2.units=='mb/day':
mv2.units = 'mbar/day' # udunits uses mb for something else
if mv1.units == '(0 - 1)': # as in ERAI obs
mv1.units = '1'
if mv1.units == '(0-1)': # as in ERAI obs
mv1.units = '1'
if mv2.units == '(0 - 1)': # as in ERAI obs
mv2.units = '1'
if mv2.units == '(0-1)': # as in ERAI obs
mv2.units = '1'
if mv1.units == 'fraction':
mv1.units = '1'
if mv2.units == 'fraction':
mv2.units = '1'
if mv1.units == 'mixed': # could mean anything...
mv1.units = '1' #... maybe this will work
if mv2.units == 'mixed': # could mean anything...
mv2.units = '1' #... maybe this will work
if mv1.units == 'unitless': # could mean anything...
mv1.units = '1' #... maybe this will work
if mv2.units == 'unitless': # could mean anything...
mv2.units = '1' #... maybe this will work
if mv1.units == 'W/m~S~2~N~' or mv1.units == 'W/m~S~2~N':
mv1.units = 'W/m^2'
if mv2.units == 'W/m~S~2~N~' or mv2.units == 'W/m~S~2~N':
mv2.units = 'W/m^2'
if mv1.units =='fraction' and (mv2.units == 'percent' or mv2.units == '%'):
mv1 = 100*mv1
mv1.units=mv2.units
return mv1, mv2
if mv2.units =='fraction' and (mv1.units == 'percent' or mv1.units == '%'):
mv2 = 100*mv2
mv2.units=mv1.units
return mv1, mv2
@painter1 commented: "As of the upcoming release 2.2 of UV-CDAT, the mean will default to an area-weighted average. Mass-weighted averages for some quantities are still waiting for me."
Susannah, could you comment on whether your units-based scheme for identifying what variables need mass weighting, was conceived in terms of atmosphere only? Or do you think it is equally applicable to other realms?
Jeff, I was only thinking about the atmosphere. You should probably get feedback from the other groups on what their conventions for averaging are. My guess is that this is likely to be much less of an issue for the other groups, since water, land, and ice are essentially incompressible compared to air, although in the ocean I think there might be similar issues related to the vertical coordinate (e.g., averaging across isolevels that are different from the model levels might be an issue in creating zonal means with a vertical axis).
Phil R. commented: "Hi All, please note that Charlie Zender is suggesting (at https://acme-climate.atlassian.net/wiki/display/ATM/Regridding+-+v0.3+AMIP+runs?focusedCommentId=25231699#comment-25231699) that our scripts use an "area" variable for all area weighting, rather than deriving the area from "lat_bnds", "lon_bnds". Please be sure to comment on this if you have an opinion"
sorry, pressed the "close issue" button by accident.
Adding to the comment from Phil R. I assume that an Area variable will almost certainly be needed when the diagnostics move to non-lat-lon grids, so it will need to be implented sooner of later.
Hi Jeff,
Sorry, I’ve been on travel for two weeks and am having a really hard time keeping up with email. I’ll follow back when I am in the office next week.
-- Susannah
From: Jeffrey Painter notifications@github.com<mailto:notifications@github.com> Reply-To: UV-CDAT/uvcmetrics reply@reply.github.com<mailto:reply@reply.github.com> Date: Wednesday, July 1, 2015 at 11:58 PM To: UV-CDAT/uvcmetrics uvcmetrics@noreply.github.com<mailto:uvcmetrics@noreply.github.com> Cc: Susannah Burrows susannah.burrows@pnnl.gov<mailto:susannah.burrows@pnnl.gov> Subject: Re: [uvcmetrics] Showstopper: Using correct (area-weighted vs. mass-weighted) averaging weights in calculating means and RMSEs (#64)
— Reply to this email directly or view it on GitHubhttps://github.com/UV-CDAT/uvcmetrics/issues/64#issuecomment-117845655.
Hi Jeff,
Sorry it has taken me a while to respond to this, I have had a somewhat hectic week. The air mass density has to be calculated from other fields in the model output. Here’s the method we would prefer be used for that:
First, calculate the pressure at the model level interfaces: pressure_int = hyai_P0 + hybi_PS
Then calculate the pressure difference between the interfaces: assume I,J,K are lon, lat, level indices (K ordered from top(1) to bottom(number of layers+1)
dp(I,J,K) = difference between pressure interfaces = pint(I,J,K+1) = pint(I,J,K) # [Pa] = [kg m-1 s-2]
Finally, calculate the air mass (column density) in each layer: rhodz(I,J,K) = dp(I,J,K)/gravity # air mass column density in layer K —> [kg / m2]
rhodz is the quantity that should be used for weighting. It is 3-dimensional, and this weighting method only makes sense for 3D variables.
Regarding the area variable, I am pretty sure that UV-CDAT (when using the cdat averager) automatically calculates this from the model grid, right? And I thought that this is now the default behavior, so we now only need to implement different behavior for the mass-weighted cases? Or are there some nuances that I am missing?
—Susannah
Susannah Burrows susannah.burrows@pnnl.gov +1 (509) 372-6183
Pacific Northwest National Laboratory Atmospheric Sciences and Global Change Division
From: Jeffrey Painter notifications@github.com<mailto:notifications@github.com> Reply-To: UV-CDAT/uvcmetrics reply@reply.github.com<mailto:reply@reply.github.com> Date: Wednesday, July 1, 2015 at 3:58 PM To: UV-CDAT/uvcmetrics uvcmetrics@noreply.github.com<mailto:uvcmetrics@noreply.github.com> Cc: Susannah Burrows susannah.burrows@pnnl.gov<mailto:susannah.burrows@pnnl.gov> Subject: Re: [uvcmetrics] Showstopper: Using correct (area-weighted vs. mass-weighted) averaging weights in calculating means and RMSEs (#64)
— Reply to this email directly or view it on GitHubhttps://github.com/UV-CDAT/uvcmetrics/issues/64#issuecomment-117845655.
Just to be very careful, I will note there is a tiny error in one of the equations Susannah sent on. The equals sign before the last “pint” string should be a minus sign, e.g. Variable dp is the difference between pint at interface K+1 and interface K.
Phil
Phil Rasch Pacific Northwest National Laboratory 902 Battelle Boulevard P. O. Box 999, MSIN K9-34 Richland, WA, 99352 Phone (509)372-4464, Fax (509)372-6153
From: Susannah Burrows Susannah.Burrows@pnnl.gov<mailto:Susannah.Burrows@pnnl.gov> Date: Friday, July 17, 2015 at 4:08 PM To: UV-CDAT/uvcmetrics reply@reply.github.com<mailto:reply@reply.github.com>, UV-CDAT/uvcmetrics uvcmetrics@noreply.github.com<mailto:uvcmetrics@noreply.github.com> Cc: "Rasch, Philip J" philip.rasch@pnnl.gov<mailto:philip.rasch@pnnl.gov> Subject: Re: [uvcmetrics] Showstopper: Using correct (area-weighted vs. mass-weighted) averaging weights in calculating means and RMSEs (#64)
Hi Jeff,
Sorry it has taken me a while to respond to this, I have had a somewhat hectic week. The air mass density has to be calculated from other fields in the model output. Here’s the method we would prefer be used for that:
First, calculate the pressure at the model level interfaces: pressure_int = hyai_P0 + hybi_PS
Then calculate the pressure difference between the interfaces: assume I,J,K are lon, lat, level indices (K ordered from top(1) to bottom(number of layers+1)
dp(I,J,K) = difference between pressure interfaces = pint(I,J,K+1) = pint(I,J,K) # [Pa] = [kg m-1 s-2]
Finally, calculate the air mass (column density) in each layer: rhodz(I,J,K) = dp(I,J,K)/gravity # air mass column density in layer K —> [kg / m2]
rhodz is the quantity that should be used for weighting. It is 3-dimensional, and this weighting method only makes sense for 3D variables.
Regarding the area variable, I am pretty sure that UV-CDAT (when using the cdat averager) automatically calculates this from the model grid, right? And I thought that this is now the default behavior, so we now only need to implement different behavior for the mass-weighted cases? Or are there some nuances that I am missing?
—Susannah
Susannah Burrows susannah.burrows@pnnl.govmailto:susannah.burrows@pnnl.gov +1 (509) 372-6183
Pacific Northwest National Laboratory Atmospheric Sciences and Global Change Division
From: Jeffrey Painter notifications@github.com<mailto:notifications@github.com> Reply-To: UV-CDAT/uvcmetrics reply@reply.github.com<mailto:reply@reply.github.com> Date: Wednesday, July 1, 2015 at 3:58 PM To: UV-CDAT/uvcmetrics uvcmetrics@noreply.github.com<mailto:uvcmetrics@noreply.github.com> Cc: Susannah Burrows susannah.burrows@pnnl.gov<mailto:susannah.burrows@pnnl.gov> Subject: Re: [uvcmetrics] Showstopper: Using correct (area-weighted vs. mass-weighted) averaging weights in calculating means and RMSEs (#64)
— Reply to this email directly or view it on GitHubhttps://github.com/UV-CDAT/uvcmetrics/issues/64#issuecomment-117845655.
Susannah, Your rhodz works well for weighting a variable which lives in the layers, i.e. between the levels where pressure levels are defined. Logically that's a good place for a variable like temperature which belongs to matter. What if the variable lives at the levels? I see that in obs data, for temperature, e.g. I can see arbitrarily assigning mass to the nearest level, or extrapolating to get dp/dz; but neither is clearly good.
Hi Jeff, I think it should work for model data, but I take your point that it might not be generalizable to obs data. Another complication is that obs data might also have different kinds of vertical levels, e.g., pressure or altitude levels instead of hybrid levels, or hybrid levels that are defined according to a different scheme.
Do you have a good example of obs data where a variable of interest is defined at layers? Also, what is the code doing currently to deal with different types of obs data vertical axes? It must be grabbing appropriate vertical axis information from the netCDF files somehow in order to be able to produce plots with a vertical dimension?
I think most of the obs datasets that have a vertical dimension will be reanalysis datasets (i.e. from models), so they should be relatively well-behaved. And there is probably only a handful of different potential "special" cases (e.g., ERA-I, NCEP, ...). Have you looked to see how many different data sources we need to deal with?
Of course different data sources have different kinds of level axes. Part of what the diagnostics package is about, is converting axes, changing units, regridding, etc., so that different data sets may be compared. To plot a comparison of CAM model output with obs data, we convert hybrid levels to pressure levels, interpolate levels to the obs grid (which is normally coarser), and plot with pressure as the vertical dimension.
I often test with NCEP obs, so the case I first noticed was 3-D temperature in NCEP obs data: that is defined at the levels, which are pressures.
Thanks! Let me think about this and get back to you.
Hi Jeff, I've looked at more of the obs that have a vertical dimension, and it seems that most if not all of them have been interpolated to pressure levels. (Hopefully they are really all on pressure levels, then there are fewer cases to deal with..)
After talking this over with Phil, our recommendation for the time being is to simply assume that the interfaces for the pressure levels are located halfway between each pair of levels (at the average pressure of the two levels), and then apply the same algorithm as before. There are some possible issues with this approach, but it has the benefit of simplicity and consistency, and we are not sure we can identify a "best" approach for all cases without knowing more about the individual datasets, e.g., what algorithm was used to interpolate each of them to the pressure levels.
The edge cases (top and bottom interfaces) need special treatment. We suggest:
So that's our recommendation, but if you think there is a better approach let us know and we can discuss.
On a related topic, thinking about this got me wondering how you are doing the vertical interpolation of CAM model to the pressure levels of the observations: is it linear, or log-linear? Ideally we would interpolate the model in the same way that the observations were interpolated, but I assume this is probably not known most of the time.
That's almost the simplest possibility, and may well be good enough for the horizontal averages this is used for. I'll try it.
Thanks Jeff.
This interpolation/extrapolation runs and looks reasonable, but I plan to look more closely. I don't know how to get the model top from ACME data, but the bottom is usually available as PS.
The answer to your last question, Susannah, is log-linear. Here's the relevant line, from vertical.py (in the diagnostics; the interpolation function is in cdutil.vertical.py): newT = cdutil.vertical.logLinearInterpolation( T, levels_orig, level_src )
Thanks!
Hi @painter1 , the model top has to be calculated using the vertical grid formula, which is easy but inconvenient.
What is the "vertical grid formula"?
I think Philip means this equation: pressure_int = hyai * P0 + hybi * PS
Yes, that is the one (thanks @susburrows ). At the top of the model, it is often the case that either hyai or hybi is zero, which can simplify the process, but in general one needs to calculate the whole thing. Note that there are ways to do this using NCO that are pretty easy, and of course it can be done in CDAT.
Now I'm wondering whether the specified "model top" is really the right thing to use. I misled everyone, I think, when writing "ACME data"; but maybe not because "model top" isn't my term. The issue here is obs files, and the hybrid levels come from the model data. Is it really ok to use the model top to find the right mass for averaging the obs data?
Ok, sorry about the confusion. This is a good question. I think that it doesn't make sense to compare the model and obs above the model top, so that should be the maximum upper limit for the top level (lowest pressure) to be compared. But if the obs data's vertical grid ends below the model data's vertical grid, then we need a different rule. I think in this case we should just use the rule "extrapolate the layer thickness from the thickness of the layer below" . Not sure if I am expressing myself clearly, does that make sense?
OK, here's my interpretation. Recall that we are averaging a quantity, say T, defined at obs levels. To get mass for horizontal averages, we want to bracket each obs level with a pair of new levels. And the issue is how to choose the new level above the top obs level (i.e., lower pressure than the top obs level). There are three cases:
And now I have started to think about implementation, and it's a bit messy. In order to get this into February's UV-CDAT release 2.4.1, I may have to skip this; we'll see.
I merged my massweighting branch into the devel branch, and installed it on Rhea, in uvcdat/latest_Mesa. This installation has a few last-minute bug fixes not actually in the repository yet. @susburrows , I would like to know how it works for you.
I've made some changes to the function that determines whether a variable is to be mass- or area- weighted. The attached files list each variable from a sample ACME output file and whether it is mass- or area- weighted. Most of the discussion on this subject is on a Confluence page.
Hi Jeff, thanks for this! I've been operating at less than normal efficiency / capacity this week due to illness. I may not be able to try out your new version until next week, but I look forward to trying it out.
I checked out the files you posted with the weighting choices and they look good to me! Thanks! I'll try to test out the code later today.
Just adding a link to the Confluence page with the rest of the discussion https://acme-climate.atlassian.net/wiki/pages/viewpage.action?pageId=35586630
Susannah, any luck with finding time to try the new code? We have a code freeze scheduled for tomorrow, for final testing prior to the 2.4.1 release next week.
Hi Jeff, Sorry, tuning activities have been keeping me very busy. I may be able to try it out before the end of the week, but I know this won’t help you with your deadline. It is still on my list though.
From: Jeffrey Painter notifications@github.com<mailto:notifications@github.com> Reply-To: UV-CDAT/uvcmetrics reply@reply.github.com<mailto:reply@reply.github.com> Date: Wednesday, February 17, 2016 at 9:47 AM To: UV-CDAT/uvcmetrics uvcmetrics@noreply.github.com<mailto:uvcmetrics@noreply.github.com> Cc: Burrows Susannah susannah.burrows@pnnl.gov<mailto:susannah.burrows@pnnl.gov> Subject: Re: [uvcmetrics] Showstopper: Using correct (area-weighted vs. mass-weighted) averaging weights in calculating means and RMSEs (#64)
Susannah, any luck with finding time to try the new code? We have a code freeze scheduled for tomorrow, for final testing prior to the 2.4.1 release next week.
— Reply to this email directly or view it on GitHubhttps://github.com/UV-CDAT/uvcmetrics/issues/64#issuecomment-185322587.
RMSE is just being added today. It is not mass-weighted. I'll create another issue for that, and close this issue when the 2.4.1 release is officially out.
Correct weightings should be used in calculating averages, sums, RMSE, etc. For instance, mass mixing ratios should be weighted by air density before averaging operations. There should be a mechanism to distinguish intensive variables (such as temperature and mass mixing ratios), extensive variables (such as air mass), and 2D fields with area averaging (such as cloud fraction), so that the appropriate calculations can be applied in each case and we know which algorithm was used for any particular field. This is an issue in all remapping and averaging operations in the climatology script and the diagnostic plotting scripts.