PCMDI / cmor

Climate Model Output Rewriter
BSD 3-Clause "New" or "Revised" License
50 stars 33 forks source link

River model grid area ... and optional values of cell_measures #220

Closed martinjuckes closed 6 years ago

martinjuckes commented 6 years ago

This relates to https://github.com/WCRP-CMIP/CMIP6_CVs/issues/312 and https://github.com/cmip6dr/CMIP6_DataRequest_VariableDefinitions/issues/85. People want to provide river flow information on the grid of the river model which may be different from that of the atmosphere model. I've taken the first step towards supporting this by adding an areacellr variable to the request. It follows that there are a number of variables which may have either cell_measures: "area: areacella" or cell_measures: "area: areacellr" How should this be indicated in the data request? For example, I could put area: areacella --OR areacellr in the cell_measures entries for these variables? Would CMOR and PrePARE be able to deal with that?

taylor13 commented 6 years ago

Denis and I will discuss and get back to you. I think river flow is a bit like horizontal fluxes (or velocity), where the area of the grid cells is rather irrelevant. At the very least, I don't think areacell should be required. Will discuss with Denis.

durack1 commented 6 years ago

@martinjuckes regarding WCRP-CMIP/CMIP6_CVs#312 are there many groups that are considering the path forward that you describe above - having a unique (non-atmosphere) river grid? If yes, then we may need to reconsider the river realm question (WCRP-CMIP/CMIP6_CVs#312)

martinjuckes commented 6 years ago

@durack1 : I don't know how may groups are considering this. On Karl's point, the variables this might affect are:

The rivi, rivo variables are not entirely like fluxes .. they are requested at cell centres and have the dimensions of a flux convergence or time rate of change of water volume in the cell. Other variables are more clearly representative of the grid area.

taylor13 commented 6 years ago

thanks, Martin. cell areas will then be useful. (waterDepth should be waterDpth)

martinjuckes commented 6 years ago

OK .. it turns out that water table depth was duplicated, as wtd (LS3MIP, daily) and waterDpth (C4MIP, monthly), so I was thinking of getting rid of waterDpth in favour of wtd, but I can do it the other way around of you have a preference.

On cell areas: do you have a preference for the syntax I use to indicate that there is a choice of two different cell measures strings for these variables? (I guess this question is for Denis : @dnadeau4 )

taylor13 commented 6 years ago

I think wtd is much more consistent with other variable names defined for CMIP5/6, so I would favor it too.

will discuss second question with Denis.

taylor13 commented 6 years ago

On further reflection, I think we shouldn't define areacellr. For Antarctica and Greenland (with special grids for ice sheets that were different from the atmosphere grid), we defined new tables. I think that was a mistake even though it was one way of avoiding certain problems.

I suggest we do the following for river data: Ask groups with a river routing grid different from the grid on which they are reporting atmospheric and land data, to treat the data as if it had been "regridded" from the atmospheric grid and therefore use the grid_label to distinguish it from the other atmospheric and land data. For example, if most land variables were reported on a model's native atmospheric grid, those variables would be labeled with grid_label = "gn", but the variables reported on the river routing grid could be labeled with (say) grid_label = "gr1" (i.e., regridded to a grid the group has chosen to be labeled "1", which would in fact be somewhere in the documentation identified as the "river routing native grid".) Different groups might, of course, choose to choose their river-routing grid as something else (e.g., "gr2" or "gr3" or "gr4").

Note that this approach would allow a group to report such things as change in river storage ("drivw") on both their atmospheric grid ("gn") and on their river routing grid (e.g., "gr2).

Note that under this scheme there would be no need for a special variable for grid cell area (i.e., we wouldn't need to define "areacellr"); it would be treated as any other variable, which could be reported on multiple grids. If it is reported on a different grid from the native atmospheric grid, its name doesn't change, but the file name changes by use of a different grid_label. So the river routing grid areas would be stored in a file under the name "areacella", but with grid_label="gr2", for example.

---------- The following should perhaps be transferred to a different issue ---------

For CMIP7 we should consider defining a more complete, compound grid label. One approach would be to construct the labels consistent with a template like:

g<realm[vert.id]>--[pt] (for example ga-n)

vert. i.d. would only be included for variables reported on one or more locations in the vertical, and the "pt" suffix would only be included for synoptic data (i.e., time "point" data, as opposed to time-mean data)

= ["A", "O", "I", "R", "L"], where "A" is for "atmosphere", "O" for "ocean", "I" for "ice sheet", and "R" for "rivers", and "L" for land, respectively, and this indicates which modeling realm's grid was used to generate the output. [need to check there is a need for all of these; or if we need additional model grids for say "sea ice"] vert.id = ["P", "Z", "L", "S"] with "P" for "pressure level data", "Z" for "vertical distance above a defined datum", "L" for "model level data", and "S" for the layer nearest the surface (e.g., CO2 concentration reported at the surface would be labeled with "S", but CO2 reported on one or more pressure levels would be labeled "P"). = ["m", "n", "r0", "r1", "r2", ...."nz", "r0z", "r1z", ..."ma", "mg", "na", "ng", "r0a", "r0g", "r1a", "r1g", ...], where "m" is for area mean over entire domain, "n" is for "native", "r" is for regridded, "z" is for "zonal mean", "a" is for a grid limited to the "Antarctica" region, and "g" is for a grid limited to the "Greenland" region. Examples: "gAP-n-pt": 3-d pressure-level data on native atmospheric grid sampled synoptically. "gOS-r0": 2-d ocean grid for the surface ocean layer regridded to target grid 0. "gR-n": 2-d data reported on the river-routing native grid. "gA-nz": data originating on the atmospheric grid that has been zonally averaged, reported at the native latitude positions. "gAL-m": global mean vertical profiles of atmospheric data reported on model levels. "gI-na": 2-d data reported on an ice-sheet model's grid over Antarctica. If we adopt this system in CMIP7, we could replace table_id in the file names and directory structure with . We could also eliminate the table_id as a search facet, since the user would have access to , , and to sub-select data of interest.
martinjuckes commented 6 years ago

That won't work. We currently have a different areacell* variable (areacello, areacella etc) for each model grid. If you interpolate the areacello variable onto the atmosphere grid it does not change the variable name, just as tos will retain the same name when it is interpolated to a new grid. We need to treat the river grid the same way in which we treat other grids. The area of river grid is a distinct quantity from the area of the atmosphere grid.

taylor13 commented 6 years ago

I know it's not perhaps aesthetically pleasing, but my proposed solution above will certainly work. It's just like when a model reports output on both its native grid and on a 1x1 deg grid, In this case areacella will be saved twice (with the same variable name and the same table name). The files will be distinguishable in the archive because they will be found in different directories and will have different file names. The file names will differ because one (on the native grid) will include the "gn" grid_label while the other (1x1 deg) will have a different grid_label (probably "gr", but in some cases "gr1" or "gr2" or the like).

This seems perfectly acceptable to me.

I realize the data haven't in fact been "regridded" to the river-routing grid, but that can be made clear in the documentation. The virtue in this is that a user looking for "dgw", for example, can look for it by the same variable name and same table_id. The user will find all available output, and some of it may be on a river-routing grid. The user will find the corresponding grid areas by searching on areacella and then sub-selecting the grid of interest (just as they would in the case discussed in the 1st paragraph above).

martinjuckes commented 6 years ago

I don't find it acceptable. We should have different names for different variables. If areacella is regridded onto a different grid, it gets a different grid_label, that is clear, but the name of the variable should not change. Similarly, if we re-grid the areacellr to the atmosphere grid, it should have the grid_label appropriate to the atmosphere grid and the variable name, areacellr, to define what the parameter is.

Your approach might work for some use cases, but there are others where it doesn't. e.g if atmosphere data is regridded to 2x2 degree and river data is also regridded to 2x2 degree.

taylor13 commented 6 years ago

The area of a cell for a given grid is defined by the grid and is independent of any other grid.

Each field originates on a single native grid. So, in the case that you say won't work (above), I don't understand the problem. The cell areas for a 2x2 degree grid are the same no matter how you got to that grid. I agree that if you start with a native grid for river routing and regrid some variable to a different grid before regridding to the 2x2 degree grid, you'll get a different answer than what you would get if you went directly from the river grid to the 2x2 deg. target grid, but we have no way to distinguish in the archive between these two routes to the 2x2 deg. grid (other than documentation). Why would someone want to follow both routes to get to 2x2 deg? If they do, we can't host both of these fields in the archive anyway (with the current DRS).

If we had been confronted with this problem much earlier in our planning, we could have renamed "areacella" as simply "areacell" (and we could have even included areacello in this new variable). I don't see the problem. It is the same quantity, independent of the "realm" and grid: it's the cell area. The cell_measures is supposed to point to the variable containing areas (or volumes) of grid cells for the variable it is attached to. It doesn't have to have a unique name for different grids in our database.

A change like the one you suggest (adding areacellr) will break CMOR3 (because CMOR automatically generates "cell_measures"), and unless there is very strong justification, we can't devote any more resources to extending CMOR3 for any newly-proposed CMIP6 requirements.

taylor13 commented 6 years ago

Do you think anyone will misinterpret what is being requested for areacella, areacello, and areacelli? These are the actual areas of the grid on which the data is reported, not the cell areas you get by interpolating the area from the original grid. I thought this was already clear, but perhaps it's not. thanks, Karl

martinjuckes commented 6 years ago

I think you are misinterpretng it ... we discussed it at some length last year and decided that areacelli should be the area of the ice model grid, not the grid that the data is requested on. I believe the same should apply to areacelloand areacella. We need a name for the area of the ice model grid, and areacelli is the obvious choice as far as I can see. I don't recall any alternative being suggested.

taylor13 commented 6 years ago

O.K. what I didn't realize is that you request both areacelli and areacellg for ice sheets, and these are both on the same grid. areacellg is the actual area of the grid cells (for the grid on which the data are reported), and areacelli is (I presume) the area of the grid cells you would get from interpolating from the native ice-sheet model grid to the grid on which the data are reported. If the data are reported on the native grid, I guess areacelli=areacellg. Right? areacellg rightfully appears in cell_measures (of the data request) because it is the actual area of the grid cells.

So what I said in https://github.com/PCMDI/cmor/issues/220#issuecomment-319749833 should have read "Do you think anyone will misinterpret what is being requested for areacella, areacello, and areacellg" (NOT "areacelli"). I think folks will probably report the correct areas for areacellg because the data request description says "Area of the target grid (not the interpolated area of the source grid)." But when they look at areacelli, they might be a bit confused, especially since the field will be identical to areacellg for models reporting output on the native grid. I don't know who has requested "areacelli" and I don't know why they would need it. If it is needed, it should be made clear why. Otherwise, I would eliminate it from the data request and avoid confusion.

Similarly for the "river data", I think we don't need areacellr. Rather we should ask for the actual cell areas for the grid on which data are reported. As stated earlier, we won't have to modify CMOR if we set cell_measures = "area: areacella", and store the actual grid cell areas there.

I'm a little unsure whether this solves all our problems. What do these "river routing" grids look like? Are they global in extent? Are they irregular and so would be stored in the CF conventions as a vector (or logical array) with coordinates pointed to with the coordinates attribute?

martinjuckes commented 6 years ago

Hello Karl,

To avoid confusion I think that areacella should be interpreted according to its current definition as the cell area of the atmospheric model, areacello should be interpreted as the cell area of the ocean model, areacelli as the cell area of the ice sheet model and areacellr as the cell area of the river model.

areacelli was added after discussion with the ISMIP6 chair, Sophie Nowicki. It is there to provide information about the spatial structure and spatially varying resolution of the ice model grid.

We should not have any data on ice sheet grids, as the data is explicitly requested on regular polar stereographic grids and there is no prospect of data on ice sheet grids be used for inter-comparison purposes. It is of course possible for two variables with different names to have the same data values ... but I'm not sure how that is relevant.

Stephane Senesis ( @sensis ) says that IPSL would like to report the 7 variables mentioned above on their native grid, which is the grid of the river routing model. In their case it is a 1x1 degree global grid.

regards, Martin

taylor13 commented 6 years ago

I may have finally gleaned what areacelli represents and how it differs from areacellg. What you want is some measure of the average area of native ice-sheet grid cells contained in each of the polar stereographic "target" grid cells. I can understand that this might be useful (in conveying a sense of model resolution, for example), but I think there should be a more complete description of this in the data request. Without further explanation I think it could be interpreted as asking for the total area of ice sheet cells contained in each target cell rather than the mean area of ice sheet cells found in each target cell.

In any case I'm not overly concerned here about areacelli because that won't break CMOR. What we can't have is some groups reporting river data on the grid of a river routing model with cell_measures = "area: areacellr", and other models reporting river data on their normal land/atmosphere grid with cell_measures = "area: areacella". CMOR can't be modified to handle that. The definition of areacella has always referred to a grid used to report data from the atmosphere model or from the land model. What I'm suggesting (so as not to break CMOR) is to interpret it as a grid used to report data from the atmosphere model, the land model, or the river-routing model. Is there a simpler way to do this without breaking CMOR?

Without changes to CMOR or the data request @Sensis should have no problems at all in reporting most of the land surface data as he plans (either on his native grid or some other grid of his choice), and report his 7 river variables on a different grid (1x1 deg, if that's what he wants). The first grid label would normally be labeled "gn" if it is the model's native grid or "gr" if he regrids everything but the river variables, and the second grid label would be "gr1" (or "gr2" or "gr3" or ...) for the river variables that are on a different grid (i.e., the native grid of the river model). The cell_measures in both cases would be cell_measures = "area: areacella". Stephane would, of course, store two different fields of areacella, one on the "gn" grid and the other on the "gr1" grid. The files containing these 2 areacella fields would have different names because they have different grid_labels, so there is no conflict in the data base.

As I said earlier in this thread, there are better ways of doing some of this (that wouldn't break CMOR), but it is too late in the game to substantially change the CMIP6 specs and data request. If we were starting from scratch, I'd simply replace areacella, areacello, and areacellg with areacell, and the grid_label would tell us what grid the cell areas apply to.

taylor13 commented 6 years ago

@martinjuckes @sensis Any problem with the above plan?

martinjuckes commented 6 years ago

Hello Karl,

I still think it would be much simpler to define areacella to be the cell area of the atmospheric model, areacello to be the cell area of the ocean model, areacelli the cell area of the ice sheet model and areacellr the cell area of the river model. I don't understand what you think this is complicated or error prone, and I don't understand what you are proposing as an alternative.

If there have been previous definitions which conflict with this approach, can you tell me where they are?

regards, Martin

taylor13 commented 6 years ago

Hi Martin,

This issue started with a question from you:

"For example, I could put area: areacella --OR areacellr in the cell_measures entries for these variables? Would CMOR and PrePARE be able to deal with that?"

The answer is that "no" CMOR could not deal with that. This would break CMOR and I can't think how we would fix this without weeks of work. This is why cell_measures must have only one option for area.

The area of the river model's native grid cells will be reported by @Sensis in areacella with an appropriately assigned grid_label. That will distinguish it from the areacella field requested for other land surface and atmospheric variables, which in his case will be on a different grid.

best regards, Karl

martinjuckes commented 6 years ago

Hello Karl,

According to the latest version of the "CMIP6 Global Attributes, DRS, Filenames, Directory Structure, and CV’s" document, grid_label=gn should be used when variables are reported on their native grids. This means that, for instance, a variable which represents that cell area of the river model on the river model native grid will have grid_label=gn and a variable which has the cell area of the atmosphere model on the atmosphere grid will also have grid_label=gn.

The same applies to sea ice variables which may have an oceanographic native grid in some models and an atmospheric native grid in other models.

regards, Martin

taylor13 commented 6 years ago

Hi Martin, Thanks for pointing that out. I propose adding to the options for "grid_label" the value "gnr" ("r" for river), similar to "gra" and "grg" (for Antarctica and Greenland). This will need to be agreed to by the WIP. The label would only be used for variables stored on the native river grid. There would be no "grr" option; To maintain consistency across all models, regridded river data would be labeled gr (or sometimes "gr1", "gr2" ...).
regards, Karl

martinjuckes commented 6 years ago

I think it would be simpler to keep the variable description in the variable, as I have asuggested above and as we agreed last time this was discussed a year or two ago. If you want to change this, we should do it in a consistent way and just use cell_measures = areacell for all variables. I think it is a bad choice, but it would work.

taylor13 commented 6 years ago

The point is we can't have two options for cell_measures (i.e., we can't have either areacella or areacellr specified for river data in the data request) because this would break CMOR. Models that want to report data on the native river grid will need to contribute to the archive both the area of their atmospheric grid cells and the area of their river grid cells. As I see it, our best options (if we want to minimize disruption) are:

1) invariably specify areacella for river grid cell_measures, but distinguish the areacella for atmospheric data from areacella for river data by specifying different grid labels (as proposed above).

2) specify areacellr for river grid cell_measures, and require all groups to contribute to the archive both areacella and areacellr. (nb. For models that have regridded their river data to an atmospheric target grid areacella and areacellr will be identical). In this case "gn" could be used for both the native atmospheric grid and the native river grid (because the files would be distinguishable from the name of the variable itself).

Perhaps 2) is the better option. Do you agree?

best regards, Karl

martinjuckes commented 6 years ago

There are other parts of the request affected by this discussion: (A) Some SIMIP variables may have an atmospheric native grid in some models and an oceanic native grid in others and they should always be provided on their native grid; (B) There is a request for the cell area of dynamic ice sheet models interpolated onto a polar stereographic grid; (C) Some river flux variables may be on an atmospheric native grid for some models and river model native grid for others. In this case the request does not prescribe which grid to use, but IPSL want to use the river model grid for river model variables.

For your option 1. to work in cases (A) and (C) we need to have the same variable name for cell area in river, atmosphere and ocean models, and use grid labels gnr, gna, gno to distinguish. We can't support case (B) with this approach.

Option 2. does not work because the native grid for some variables varies between models.

My suggestion is to have a different variable name for each model grid, with suffixes o, a, r, i for ocean, atmosphere, rivers and ice-sheets respectively. This would imply having two options for some variables. CMOR is not able to support this via a flag in the cell_measures table entry, but already supports multiple options for some variables through multiple CMOR records (e.g. ficeberg can be either a 3d or a vertically integrated field .. specified by two CMOR records with the same output name). For variables which may have one of two native grids, I could provide two request entries with appropriate metadata, and the modelling group would then have to choose the correct one for their model (as they must do for ficeberg). This would support all the cases above. I believe it is the cleaner solution because it allows us to attach meaningful variable names to the cell areas of different model components.

regards, Martin

martinjuckes commented 6 years ago

Hi Karl, After writing the above I realised that we also need to account for the re-gridded data. We currently have a variable areacellg for the cell area of interpolation target grids. If CMOR is not able to enter cell_measures=areacellg when data is interpolated from the native grid, then the options are either to have duplicate CMOR table entries for all variables, one for native grid and one for interpolated data, or to drop the use of areacellg. I had hoped that we would be able to specify some meaningful metadata for the interpolated data to provide users with some indication of what they are dealing with: the current situation in which they can only indicate whether it is interpolated or not and then provide a free text description is unsatisfactory.

regards, Martin

taylor13 commented 6 years ago

Two issues are addressed here: A) How to treat the two kinds of fundamentally different “cell areas” called for in the request, and B) how to handle the French request to allow river data to be reported on a different grid from other variables.

A. Confusion concerning areacellg and areacelli.

The intent of areacella, areacello, and areacellg is to provide the grid cell areas for grids used to report model output. These areas are not interpolated versions of the grid cell area of the native grid. I’ll refer to “areacell” data as “output grid cell areas” (which in some cases could also be the native grid cell areas). These variables can be pointed to by the cell_measures attribute.

When a field is regridded away from the native grid, ISMIP argued that in the case of ice sheet model grids we should also request a variable that indicates something about the original grid’s resolution. Consequently, the current CMIP6 data request requires that for these grids the mean area of native grid cells contained by each output grid cell be saved (on the output grid). In the current data request this variable is called “areacelli”, but this variable cannot be pointed to by the cell_measures attribute and would never be used directly in the analysis of other fields. I think we should change its name so it won’t be confused as being somehow comparable to areacella, areacello and areacellg. Perhaps we could rename it natCellszi – “native (grid) cell size: ice” -- or origIceCellsz – “original ice (grid) cell size”.

B. What to do about the French request to report river data on a different grid from other land and atmospheric data.

Until now, we have generally required that all the data requested in one CMOR table be reported on a common grid. [An exception was that some variables (like transports) could be reported at the same resolution but offset half a grid cell.] Now we are asked (by river modelers) to introduce the complication that within a single table some variables might (or might not) be reported on an entirely different grid from the other variables. Introducing this complication this late is problematic, and perhaps we should simply refuse to generalize things in this way for CMIP6.

If we want to accommodate this request without disrupting things too much, I can think of the following options:

  1. Distinguish the variables reported on the river model’s native grid from those reported on the normal land/atmos. grid by including them in a new set of tables. (not my first choice)
  2. Define new table “entry” names for these variables, but give them the same output name. For example, “dgw” could be accessed in CMOR as “dgwNative” or as "dgw", but on output it would invariably be named “dgw”. The cell_measures attribute for the “dgwNative” entry would be “area: areacellr”, but for the “dgw” entry it would remain as “area: areacella”. Only those groups writing river data on their native grid would be asked to provide the “areacellr” field. I think this might be what you propose in https://github.com/PCMDI/cmor/issues/220#issuecomment-326346734 above.
  3. For the river variables that might be reported on a native grid (listed at https://github.com/PCMDI/cmor/issues/220#issuecomment-319410797 ) we could change the cell_measures to “area: areacellr”. Groups reporting these variables on the native river grid would provide the areas of the native grid in areacellr; for those reporting on the same grid as the other atmos./land variables, areacellr would be identical to areacella. This is what I proposed as option 2 above: https://github.com/PCMDI/cmor/issues/220#issuecomment-326038277 ]

A note on grid_labels: Under any of the above options, we will need to provide guidance on how to label the grid. I think that’s easy:

  1. If a variable is reported on its native grid (independent of the model type), the grid label should be “gn”.
  2. If a variable has been regridded to some target grid preferred by the data provider, it should be labeled “gr”

Examples:

tas, tos, and dgw all reported on their different native grids would be labeled “gn”.

dgw calculated by a model on the same grid as atmospheric variables and reported without regridding would be labeled “gn”, as would the “tas” variable. In this case the native grid is the same for the river and the atmosphere.

dgw is calculated by a model with a native grid that is different from the atmospheric native grid and then regridded to the atmospheric grid. In this case the grid label for dgw would be “gr”, but tas would have a label of “gn” because it is reported on its native grid.

The requirements for the grid labels are:

  1. They should tell users whether data has been regridded or not (from its native grid)
  2. They should distinguish between a field that is reported on more than one grid.

Grid labels do not uniquely define grids. Different grids can have the same grid label.

P.S. By the way in https://github.com/PCMDI/cmor/issues/220#issuecomment-326346734 you state that “ficeberg can be either a 3d or a vertically integrated field specified by two CMOR records with the same output name" What's the justification for using the same outname for these two different fields? We normally have considered vertically integrated quantities as being different from vertically resolved quantities. They usually don’t even have the same standard_name, so I think they should have different variable names.

Finally, I don’t understand what you’re saying in https://github.com/PCMDI/cmor/issues/220#issuecomment-326615796 . I don’t think CMOR has a problem with cell_measures = “area: areacellg” for either models reporting on their native grid or on their target grid. As long as there is only one option (i.e., “areacellg”), then there is no problem. The grid_labels in this case would of course be different depending on whether the data is on the native grid or not.

martinjuckes commented 6 years ago

Dear Karl,

(A) As I have said many times before, I believe that areacellaand areacello should represent the area of the atmospheric and oceanographic model grid. If they are intended to have the same meaning, i.e. the cell area of the grid being used, they should have the same name.

taylor13 commented 6 years ago

Dear Martin,

I don't disagree that if we were starting anew, that would make sense (and I've already said that before), but I think it is too late to introduce this major change to the data request (affecting nearly all variables). In all previous phases of CMIP, ocean (and usually sea ice variables) were generally stored on a common grid (of the data provider's choosing) and the area of the grid cells were saved in areacello. Similarly all atmospheric and land variables were stored on a (possibly different) grid, and the area of those grid cells were saved in areacella. Note that areacello and areacella are invariably the areas of the grids on which the data are reported, which may or may not be areas of the native grids. In CMIP6 I think we must stick with this usage except for a few exceptions (reporting of ice sheet or river model data on grids different from the reported atmospheric and ocean grids). So which of my options above do you prefer, or is there a better option that preserves use of areacello and areacella? Karl

martinjuckes commented 6 years ago

It is not a change, I've never seen your definition before. As far as I'm concerned, you have introduced it here. I have asked above if your definition has been stated before, but I don't see any evidence that it has. My recollection is that we have discussed this a year or two ago and agreed that areacell... should refer to model grid area. Your proposal is a change to the data request, and I am arguing against it because so far you have not provided any justification.

taylor13 commented 6 years ago

Prior to CMIP5, model output was generally reported on latxlon rectilinear grids, so the area of each grid cell could be calculated exactly, based on cell bounds alone. In CMIP5 areacella and areacello were introduced because it is important to be able to exactly calculate global integrals of, for example, outgoing longwave radiation. To do this you need to weight the grid point values by the actual area of the grid cells. This is well understood by modelers producing the data, and was made clear in the "requirements" document: http://cmip-pcmdi.llnl.gov/cmip5/docs/CMIP5_output_metadata_requirements.pdf. See the discussion of associated_files which states: "These cell areas should be defined such that exact global integrals of energy fluxes at the surface and “top of the atmosphere” can be computed."

Furthermore, since areacella and areacello appear in cell_measures, the values stored in areacella and areacello must be consistent with the CF conventions which clearly states that these represent the actual area of the grid cells on which the data is reported.

Before the ice sheet modelers requested it, no one had expressed any interest in a regridded version of the native grid cell areas. The actual grid cell areas, on the other hand, are essential for analysis, and I would not like to rename them generically to "areacell" because that would indeed be a change affecting most "cell_measures" definitions.

martinjuckes commented 6 years ago

As you know, CMIP5 data was requested on the model grid.

taylor13 commented 6 years ago

There was no requirement in CMIP5 for data to be reported on the native grid. In the spreadsheet where output was requested, we "recommended" that ocean data be reported on the native grid. We made no recommendation concerning atmospheric data. In the "requirements" document we stated that "Oceanic fields that are a function of the vertical coordinate should usually be reported on the native grid", but again said nothing about atmospheric data. "Should usually" doesn't imply a strict requirement, and I know some groups in fact regridded their ocean data to latxlon grids before archiving them. This was both sensible and perfectly acceptable.

What was new in CMIP5 was that CMOR was generalized to be able to accommodate grids other than cartesian lat x lon grids, which made it possible to store data on native grids.

Anyway it is perhaps irrelevant what the requirements were. We accepted regridded data, and the modeling groups invariably provided the actual areas of the grid cells in areacella and areacello (not a regridded version of the native grid cell areas). I don't think any modeler gave even a fleeting thought that we might want anything but the actual grid cell areas. That is perhaps why we didn't feel a need to formally define areacella and areacello in CMIP5. It was well-understood by the modelers what we wanted.

taylor13 commented 6 years ago

As in CMIP5 areacella, areacello, areacellg, volcello, and areacellr (if we go with either suggestion 2 or 3 in https://github.com/PCMDI/cmor/issues/220#issuecomment-326814480 above) shouldn't have cell_methods = "area: mean". Unlike most other variables in the data request, these are extensive quantities and it is optional to specify a cell_methods, since the default cell_methods ("sum") applies.

martinjuckes commented 6 years ago

I suppose they could be seen as extensive .... you are interpreting them as surface_area summed over the grid cell, while I'm taking the standard name cell_area literally and interpreting it as a discrete property of the model grid, neither intensive nor extensive.

For the sea ice variables which are (a) required on their native grid and (b) may be native to atmosphere or ocean in different models I think only option (1) or (2) in #220 (comment) works, and I agree that (1) is not a good option (I think it would be confusing for users and make life difficult for analysis software). Option (3) does not work in this case because areacella ... gn and areacello .... gn will generallly be different.

If we want these variables treated as extensive I think we should make this explicit with cell_methods = area: sum .. rather than expecting people and software to understand the distinction.

A disadvantage is that we will end up with lots of data on a 1x1 grid with different cell_measures attributes pointing to the areacella ... gr, areacello ... gr, etc. Although I think it would be better to have a single cell area variable for the 1x1 grid, this alternative won't break anything.

I don't think the existing definitions (e.g. "Horizontal area of ocean grid cells") are appropriate for the extensive interpretation of the areas. Could we use just "Grid Cell Area" for the long_name and, for the comment, for areacello: "Cell area of the grid used. These cell areas should be defined such that exact global integrals of energy fluxes at the surface and top of the atmosphere can be computed. This variable to be used in the cell_measures attribute of variables which come from the ocean model grid."?

As you noted above, we will have to change the name and description of areacelli.

taylor13 commented 6 years ago

Hi Martin,

O.K., I think I agree with most of what you propose, but I don't understand why you say "Option (3) does not work in this case because areacella ... gn and areacello .... gn will generallly be different."

Option 3 (restated) was:

For the river variables under discussion (which might be reported on a
native grid), we would have a single entry for each river variable in the 
tables and for these variables we would invariably set cell_measures =   
“area: areacellr .... ”.  All groups would be required to contribute to the archive 
both areacella and areacellr fields.  For models reporting the river 
variables on a native grid different from the atmospheric grid, the two areacell
fields would be different, but for all other models identical fields would be stored 
in areacella and areacellr.

Why do you think that would be a problem? [I'm probably missing something, but I don't see what it has to do with areacello.]

If we proceeded accordingly, we would then have only a single cmor-table entry for the river variables and the following attributes:

cell_methods = "area: sum ....".
long_name = ""Grid Cell Area"

and "comments" included for areacella, areacello, areacellr, areacellg, volcello:

For areacello:
comment = "Cell areas for a grid used to report ocean variables and any other 
variable using that grid (e.g., sea ice).  These cell areas should be defined to 
enable exact calculation of global integrals (e.g., of vertical fluxes of energy at 
the surface and top of the atmosphere)."

For areacella:
comment = "Cell areas for a grid used to report atmospheric variables and any 
other variable using that grid (e.g., soil moisture content).  These cell areas 
should be defined to enable exact calculation of global integrals (e.g., of vertical 
fluxes of energy at the surface and top of the atmosphere)."

For areacellg:
comment = "Cell areas for a grid covering Greenland or Antarctica and used to 
report land ice variables.  These cell areas should be defined to enable exact 
calculation of area integrals (e.g., of vertical fluxes of energy at the surface 
and top of the atmosphere)."

For areacellr:
comment = "Cell area for a grid used to report river flow variables.  These cell 
areas should be reported even if the river grid is the same as the atmospheric 
grid." 

For vollcello:
comment = "Cell volumes for a grid used to report ocean variables.  Summing 
over these volumes should yield the correct total ocean volume (as represented by 
the model)."

This approach seems to me to be most straight-forward and won't raise questions about which "entry" label applies to a model under option 2.

regards, Karl

martinjuckes commented 6 years ago

Hello Karl, the problem is with the sea ice variables, e.g. sitemptop, which, like rivi, can have different native grids in different models. sitemptop might be native on either the ocean or atmosphere grids, while rivi can be native on either the river of atmosphere grids. I think we should use the same approach to both, and I don't see how (3) can work for sitemptop. The groups will of course be required to provide both areacella and areacello, but they will generally be different, so sitemptop needs to be archived with the cell_measures value which is appropriate to the model in question.

martinjuckes commented 6 years ago

PS: sea ice should not be an example for the ocean grid because, according to Dirk Notz, some models have sea ice variables on the atmosphere grid.

taylor13 commented 6 years ago

This is news to me. Every model I've encountered that prognostically calculates sea ice changes and movement does this on the same grid as the ocean. They might subsequently regrid some of the information to a different grid (e.g., the atmosphere) and perform additional diagnostics on that grid, but that also would be news to me and would clearly blur the distinction between what we call native and regridded data. I've written to get confirmation and a better understanding of what might be coming in the future in the way of sea ice models and grids. The data request currently includes at least 90 sea ice variables. I think we need to find a way to avoid having two entries for each sea ice variable (as required by option 2). I can think of at least two different approaches that I think would be acceptable, but let's wait to hear from the sea ice experts.

taylor13 commented 6 years ago

Seems like we might be able to handle this fairly easily given that modeling groups traditionally report sea ice on the ocean grid even if some sea ice variables are calculated on a different grid. Just as we expect land variables to be reported on the grid used to report atmospheric variables, we can require sea ice variables be reported on the same grid as the ocean variables. Then we can invariably set for sea ice variables cell_measures = "area: areacello".

For sea ice and ocean fields, "grid_labels" would be determined (in both cases) by the ocean grid. If ocean data is reported on the native grid, then sea ice fields reported on that grid would also be assigned grid_label="gn". [and similarly for land fields and the atmospheric grid]. For the few river flow fields, the grid_label would be determined by the river grid, with "gn" applying if the river fields are reported on the river routing model's native grid (which for most models will be the same as the atmospheric model's native grid) and "gr" applying if the river routing model fields are regridded to some other grid (which could be, for example, the , of course, be the atmospheric model's native grid).
There won't be any grave consequences if the grid_labels are incorrectly defined unless it leads to conflicts in the file names, and I think there is little danger of that.

taylor13 commented 6 years ago

Here is information about sea ice grids:

Hi all,

As you know, for CMIP6 we're collecting model output from multiple
experiments.  We are trying to work out the details of some of the
metadata that will be included in the output files.  My knowledge of sea
ice models is now somewhat dated, so I hope that you will be able to
answer the following questions for me about the horizontal grids used in
representing sea ice fields:

1) For models that include prognostic equations for changes in sea ice,
      a) Are the sea ice model prognostic equations in all models
currently solved on the same grid as the hosting ocean grid?  If not, do
they have their own "native" grid, or do they get evaluated, for
example, on an atmospheric grid?
      b) Are any variables characterizing the ice evaluated on a grid
different from the one used for the prognostic equations?   If so, can
you please provide an example. (Note, I realize in some models sea ice
variables might be regridded either to couple to the atmosphere or
simply to be reported on a different grid, but I'm interested here in on
what "native" grid they are generated.]

2)  Are there any purely diagnostic models for sea ice?  If so, on what
grid are diagnostic sea ice variables evaluated?

3)  In the past I know, all (or nearly) all sea ice models shared the
same grid as the ocean.  Do you see that changing in the next 5 years or
so?

thanks very much for your help with this.  I'm afraid hearing from you
about this is rather urgent because folks are beginning to save CMIP6
output.

thanks again and best regards,
Karl

Reply from Cecilia Bitz:

My answers are below. I'll be interested to hear if the others know differently.

> 1) For models that include prognostic equations for changes in sea ice,
>       a) Are the sea ice model prognostic equations in all models 
currently solved on the same grid as the hosting ocean grid?  If not, 
do they have their own "native" grid, or do they get evaluated, for 
example, on an atmospheric grid?

**I have never seen a model where the sea ice output is on a 
different grid than the ocean. I wouldn't rule it out though. A 
possibly relevant odd case I can think of is the Hadley Centre 
models, which has sea ice thermodynamics coupled to the 
atmosphere boundary layer without time splitting. I believe 
they then regrid the thermodynamic variables to the ocean grid 
for transport and deformation steps. I'll see if I can verify this, but 
I'm pretty sure.  I believe all the output has been archived on the 
ocean grid in past CMIPs.**

>       b) Are any variables characterizing the ice evaluated on a 
grid different from the one used for the prognostic equations?   
If so, can you please provide an example. (Note, I realize in some 
models sea ice variables might be regridded either to couple to 
the atmosphere or simply to be reported on a different grid, but 
I'm interested here in on what "native" grid they are generated.]
>
**In the CICE model the sea ice grid is a B grid so (UVEL,VVVEL) 
are staggered from the tracer grid (Tsfc,SIC,SIT,etc). But 
(UVEL,VVEL) are regridded onto the tracer grid when they are sent 
to history or through the coupler to other components. I don't 
know about other sea ice models.**
> 2)  Are there any purely diagnostic models for sea ice?  If so, 
on what grid are diagnostic sea ice variables evaluated?

**Not that I'm aware of.**

>
> 3)  In the past I know, all (or nearly) all sea ice models shared the 
same grid as the ocean.  Do you see that changing in the next 5 
years or so?
I doubt it. In fact, I think the Hadley Centre is soon to conform to 
other models. I'm in the UK and we have a sea ice modelers meeting 
on Monday (Dirk is attending). We can confirm this with the Hadley 
Centre if you you wish.

Additional information from Cecilia:

I was slightly off about HadGEM3. Not all of the thermodynamics is done 
in the atmosphere. Hewitt et al (2011) 
https://www.geosci-model-dev.net/4/223/2011/gmd-4-223-2011.pdf  say:

Surface sea ice temperature, atmosphere to ice fluxes and the conductive 
heat flux through the ice are calculated in the atmosphere component (as 
in HadGEM1, McLaren et al., 2006) while the remaining calculations (ice 
growth and melt, dynamics and ridging in thickness cate- gories) are 
carried out by the CICE submodel on the ORCA grid at the same resolution 
used by the ocean component. The ocean and sea ice subcomponents 
always need to run on the same grids to ensure that fluxes of heat and 
freshwater can be accurately maintained between both submodels.

Information from Dirk Notz:

Short answer from my side:

- All models I know of calculate all state variables of sea ice on the
ocean grid. This includes the purely diagnostic measures sea-ice extent
and sea-ice area that we request for CMIP6

- Many models calculate atmospheric fluxes over sea ice only on the
atmosphere grid, using a re-gridded sea-ice cover for these
calculations. The sea-ice model on the ocean grid never sees these
individual fluxes but only obtains some integrated net surface flux.
This flux is then used in the sea-ice model on the ocean grid to update
surface temperature, ice thickness, internal temperatures etc. This is
done for most European models I know of, including our MPI models,
Hadley Center models, EC-Earth. These models will hence report
atmospheric fluxes over sea ice on the atmospheric grid, as far as I know.

- I don't believe that sea-ice models will move to a grid separate from
the ocean very soon. However, given the change in computer architecture,
sea-ice rheology might become a burden for high-resolution simulations
eventually. This would imply that high-resolution atmosphere/ocean
models might have to be coupled to a sea-ice model running on its own
grid at lower resolution. We don't have a good solution yet on how to
truly effectively run sea ice on the new generation of exa-scale computers.

Additional information from Cecelia:

Dirk's email reminds me that I did ask the Hadley Centre contingency yesterday 
about their model, and they said HadGEM3 is their CMIP6 model. So the paper I 
quoted in my previous email is up to date. They mentioned that GFDL is using 
the same grid for atmosphere, sea ice and ocean, which they think is the way of 
the future. 

Information from Ed Blockley

As Dirk says we shall be providing sea ice variables for CMIP6 on 2 grids. 
All our surface exchanges are performed in the atmosphere/land model (at 
each atmospheric time step) and so things like the list below (at end email) 
are output from our atmosphere model on the atmosphere grid.

This means that we will also include sea ice concentration and the 'ice present' 
heavyside function variable from the atmosphere.

As Dirk also says the ice model at present does not scale well.
Therefore one of the next tasks that we will perform is to decouple the 
sea ice from the ocean model and run it through OASIS at each time step.
This functionality exists in the NEMO model system but, at present, requires 
the models to be on the same grid (i.e., only load balancing is changed to 
speed the ice up). I suspect this is likely to change in the near(ish) future to 
allow the ice model to run on a different grid if needed (then we could have 
high res NH/SH and low-res tropics).
This could lead to the possibility of the sea ice being done entirely on the 
atmosphere grid instead of the ocean grid but we've not really discussed 
anything like that yet.

PS it's perhaps worth noting that - as we use NEMO ocean and CICE sea 
ice - currently our ice is on a different grid from the ocean. The tracer points 
(grid-cell-centres) are identical but the velocities are quite different owing to 
the fact that CICE uses a B-grid and NEMO a C-grid.

Example (not exhaustive) list of atmosphere diags sea ice for HadGEM3:

Downwelling shortwave flux over sea ice
Upward shortwave flux over sea ice
Downwelling longwave flux over sea ice
Upward longwave flux over sea ice
Net sensible heat flux over sea ice
Atmospheric drag coefficient
taylor13 commented 6 years ago

Given the above, it appears that some groups will report sea ice variables on ocean grids, while others will report on atmospheric grids, and some will report a mix.

I would therefore favor omitting the cell_measures for sea ice variables. Users who want to use the cell areas will certainly not have a difficult time finding them, and their omission as metadata shouldn't break anything. If we wanted to include this metadata, we would have to provide two options for these variables and I suspect many data providers would make a mistake and include areacello as the cell measure when they should include areacella (and vis versa).

I hope you agree. (Note that in CMIP5, most of the sea ice variables had "areacella", whereas in the current data request most have "areacello", so already I suspect there will some confusion.)

martinjuckes commented 6 years ago

Yes Karl, and I don't think leaving out cell_measures is a good solution. Omitting the metadata will break any software which is trying to use it. Why are you refusing to respond to my suggestions? We need to make some progress somehow.

taylor13 commented 6 years ago

At this late hour, no really good solution exists. I responded to your suggestion (see https://github.com/PCMDI/cmor/issues/220#issuecomment-331507665) by saying I didn't want to have duplicate entries for order 10^2 sea ice variables and order 10 river variables. Double entries mean that a modeling group has develop smarter software in calling CMOR. Moreover, I think many data providers would likely choose the wrong entry of the two options and then CMOR would write incorrect cell_measures metadata into the file. (or was there another suggestion I missed?)

Rather, since this is not critical metadata, and until CF 1.7 there was not even a way to associate external data like areacell with a variable (so I doubt if any software exists out there to make use of it), I would prefer to either

1) omit cell_measures for sea ice and river data (that way it won't be incorrect), or 2) omit cell_measures for sea ice data and require areacellr from all models as suggested in https://github.com/PCMDI/cmor/issues/220#issuecomment-331312965 .

Users who need areacell for their calculations will easily find it and they can determine whether areacella or areacello should be used based on the coordinate information.

If you want your data request to indicate that cell_measures can be either "areacella or areacello" (depending a model's formulation), we will find a way for CMOR to interpret that the same way it interprets cases where cell_measures is undefined (set to the empty string) or is set to -OPR (as in variables uo and vo). In those cases, if a data provider wants to include cell_measures, he must add the attribute ("by hand") through a special CMOR call.

If you want to indicate that two options are allowed other than connecting them with " or " (as in "areacella or "areacello"), please check with us first.

martinjuckes commented 6 years ago

I have software which uses cell measures software to find external data .. so I'm pretty sure it exists. A lot of the CMIP5 metadata is not explicitly defined in the CF convention, but still gets used. Associating data with appropriate grid information is critical for many calculations. That is one reason why information about masking variables has been added in the cell_methods comments of the form area: mean where sea_ice (comment: mask=siconc), so that it is possible for software to exploit the data without someone having to set the mask for each variable manually. I would prefer the option of two records in order that we could have cell_measures and cell_methods set accordingly, but if you really think the modelling groups are going to have difficulty telling their ocean from their atmosphere grid I'll go with areacella or areacello in cell_measures and (comment: mask=siconc or siconco) in cell_methods.

taylor13 commented 6 years ago

Over night I became less comfortable leaving off cell_measures (and I know you also are uncomfortable doing that), and after considerable mulling, Icame up with one more option that I hope we might agree on. I need to write it down. Then I'll post it here later today.

taylor13 commented 6 years ago

@martinjuckes I have spent too much time trying to write a document to guide groups about their gridding options and find that the only approach I can easily explain is the one that doesn't require us to make any changes to the data request or to CMOR to accommodate the various special variables discussed in this issue thread (river variables and some sea ice variables).

Please read the detailed guidance concerning grids at https://goo.gl/3VSQuK .

I think we should ask for comments from the WIP (and any other interested parties), but if anyone wants to propose a different approach from what I've described in the above document (which has the added virtue of being fully consistent with the data request and global attributes documents in place for about a year), then I think they should be required to write a document with all the details clearly specified (similar to https://goo.gl/3VSQuK). Then the WIP can decide between the alternatives with full information.

martinjuckes commented 6 years ago

(1) What you are suggesting in the google doc ( https://goo.gl/3VSQuK ) is quite a change from what is currently in the data request; Personally, I would like people to stop propsoing disruptive changes. (2) We need to resolve the cell_measures issue. What you are proposing is NOT consistent with what is in the data request.

We could go with the following:

For areacello: comment = "Cell areas for any grid used to report ocean variables and variables which are requested as used on the model ocean grid (e.g. hfsso, which is a downward heat flux from the atmosphere interpolated onto the ocean grid). These cell areas should be defined to enable exact calculation of global integrals (e.g., of vertical fluxes of energy at the surface and top of the atmosphere)."

For areacella: comment = "Cell areas for any grid used to report atmospheric variables and any other variable using that grid (e.g., soil moisture content). These cell areas should be defined to enable exact calculation of global integrals (e.g., of vertical fluxes of energy at the surface and top of the atmosphere)."

For areacelli: comment = "Cell areas any grid used to report ice sheet variables. These cell areas should be defined to enable exact calculation of area integrals (e.g., of vertical fluxes of energy at the surface and top of the atmosphere)."

For areacellr: comment = "Cell area for any grid used to report river flow variables. These cell areas should be reported even if the river grid is the same as the atmospheric grid."

For vollcello: comment = "Cell volumes for any grid used to report ocean variables. Summing over these volumes should yield the correct total ocean volume (as represented by the model)."

modelAreaCelli: long_name = "The cell area of the ice sheet model. When interpolated to a regular grid, it should be interpolated (not summed) with a conservative scheme to preserve total area".

I've modified the areacello definition to take into account the atmospheric flux variables which are interpolated onto the ocean grid and then requested as ocean variables. I've removed areacellg -- this was introduced into the data request, after extensive discussion, as a generic grid area for regular grids. It has never been anything specifically to do with "Greenland". I've made the areacelli definition consistent with the other variables, as always intended. I'll also change the cell_methods to area: sum. To me it looks like a messy solution, but I can see that there are CMOR constraints we have to live with.

For the cell_measures of sea-ice variables I would like to at least have areacello or areacella and you can either filter it when creating the CMOR tables, or within CMOR. Will this work? Can we do the same for the 7 river variables (i.e. have areacella or areacellr in the cell_methods)?

taylor13 commented 6 years ago

@martinjuckes The specifications I wrote down in the google doc ( https://goo.gl/3VSQuK )  were meant to avoid changing anything in the data request and also to avoid modifying (or degrading the utility) of CMOR and PrePARE.  Can you please tell me, specifically, what in that document is inconsistent with the current data request?

In late April a single group asked us to modify the data request to report river model output on a native grid.  Since this came so late, I don't think we should be compelled to do anything that might make it inconvenient for most modeling groups (who don't report river output on a special "river model" grid) to use CMOR to write their model output.  If the data request specifies "areacella or areacellr" for cell_measures (rather than “areacella” alone), that will break CMOR.  If CMOR were to filter out cell_measures for these variables, then output from most models will not point to an "areacell" for river variables.  I don't want that to happen, so I've proposed a solution that retains the current specification of cell_measures="areacella" for these variables and enables folks to store their river data on whatever grid they like.  The only drawback (and I think we must live with it for CMIP6) is that the grid_label for river variables stored on a river model’s native grid that is different from its atmospheric native grid will not be labeled “gn”, but rather "grI" (where “I” is some integer selected by the data provider).

Similarly, for sea ice variables I think we shouldn't change things from the current request, which asks that most variables be reported on the same grid as the atmosphere, one variable (one of the variables for sea ice concentration) be reported on the same grid as the ocean, and a few variables on the “model” grid. By “model” grid, I assume what is meant is that they should be reported on the native grid on which they are calculated, but I don’t know if that has been made clear anywhere.

I see no reason why we should change the request to have sea ice variables that are now requested on a specific grid (e.g., atmosphere) be reported on either the atmosphere or ocean grid. A modeling group may, of course, choose to store data on whatever grid they like (we won’t reject it), but we need to make sure they can do this without inconveniencing those who abide by the current request.

By the way, I always interpreted “areacellg” as the area used to report characteristics of glacial ice (the “g” standing for glaciers). I don’t see any compelling reason for changing “areacellg” at this late date, although it could be done without breaking CMOR. [That being said, this would make any model output (including “areacellg”) already written using the current specs incompatible with future variables named “areacelli”, and analysts will have to download “areacellg” from some models and “areacelli” from others. Moreover, if a group has written “areacelli” under the current data request specification, it will not be comparable to the “areacelli” of future output. Given those problems, we should not make a change unless we can provide clear justification for doing so. I, for one, wouldn’t be able to come up with compelling justification.]

Unless I’ve missed something critical, the google doc providing grid guidance is compatible with the data request as it stands. Please let me know if it is incompatible, and let’s try to change the guidance rather than the data request to make them consistent.

martinjuckes commented 6 years ago

areacellg is not defined as you thought it was defined, and never has been. It is defined with long name "Grid Cell Area for Interpolated Grids" and description "Area of the target grid (not the interpolated area of the source grid)." The variables to be used with variables from the ice sheet grid is areacelli (currently "Horizontal area of ice-sheet grid cells" -- but I don't think this adequately captures the meaning you insist on.)

taylor13 commented 6 years ago

Before August 2, I didn't understand the difference between areacelli and areacellg, but that was clarified and I wrote https://github.com/PCMDI/cmor/issues/220#issuecomment-319790565 more than 2 months ago:

 areacellg is the actual area of the grid cells (for the grid on which the [glacier] data 
are reported), and areacelli is (I presume) the area of the grid cells you 
would get from interpolating [cell area] of the native ice-sheet model grid to the 
grid on which the data are reported.

So I think we've been in agreement about the meaning since then.

Under this definition, areacellg could appear in "cell_measures", but areacelli could not. This is in fact the case (i.e., consistent) with the current data request.

Since folks are beginning to write data, I think we must avoid redefining areacellg as you suggested in https://github.com/PCMDI/cmor/issues/220#issuecomment-334915071 (or was there a typo? shouldn't "areacellI" be "areacellg" in the following?):

For areacelli:
comment = "Cell areas any grid used to report ice sheet variables. These cell areas should be defined to enable exact
calculation of area integrals (e.g., of vertical fluxes of energy at the surface
and top of the atmosphere)."

This is the current definition of "areacellg" and should remain so.

I don't think I have a problem renaming the current "areacelli" as "modelAreaCelli" if you prefer that; that way it might be less easily confused with the cell areas of the "reporting" grid.