cmip6dr / CMIP6_DataRequest_VariableDefinitions

Definitions of variables in the CMIP6 Data Request
7 stars 0 forks source link

cell_methods wrong? #394

Open taylor13 opened 5 years ago

taylor13 commented 5 years ago

@adcroft raised issues some time ago at https://github.com/PCMDI/cmip6-cmor-tables/pull/216 which I think probably needs to be addressed. He commented:

  1. Variable volcello should have cell method "area: sum volume: sum". The current value of "area: mean" is simply wrong for an extensive quantity.

  2. Variable volcello should not have a self-referencing cell measure (in the same way areacello does not).

    • It is not clear whether volcello should have any cell measures but if the methods are corrected to "sum" it will do no harm.
  3. Variable masscello is vertically integrated (over 3d cells) and therefore should include "olevel: sum".

  4. Variable thkcello is vertically integrated (over levels) and therefore should include "olevel: sum"

matthew-mizielinski commented 5 years ago
  1. Variable thkcello is vertically integrated (over levels) and therefore should include "olevel: sum"

Just to check; thkcello is a 3D variable, i.e. it has dimensions of longitude latitude olevel (Ofx) or longitude latitude olevel time (Omon, Odec), so would putting olevel: sum in the cell methods be confusing here?

martinjuckes commented 5 years ago

Hello @taylor13 , @adcroft,

  1. OK, I agree a correction is needed. However, volume: sum would not be correct as volume is not allowed in this context. It would have to be a combination of lev and area. The ordering here implies that the lev integral is done first, which is the more natural approach. So perhaps lev: area: sum, but see comments below)

  2. Why not?

3, 4. The name of the vertical dimension is lev, so the term would be lev: area: sum as above.

Some other variables in the Data Request use depth: sum (and there is one usage of depth: minimum inherited from CMIP5): after checking the convention again, I now think that this usage is invalid as it stands. There is a line in the text saying that In the specification of this attribute, name can be a dimension of the variable, a scalar coordinate variable, a valid standard name, or the word "area". which appears clear as a self standing sentence, but the simple interpretation of this sentence (that nay standard name can be used) is contradicted in a later subsection which limits it to latitude and longitude. The conformance document and the CF checker appear to follow the more open interpretation, allowing any standard name. Ideally, some of these terms should have an additional auxiliary coordinate specifying the range of the sum/mean/integral, but such a change is not feasible for CMIP6, and not really needed.

For the vast majority of variables there is no indication of the vertical processing given in the cell_methods string. This approach follows that used in CMIP5: horizontal and temporal cell methods are set, but vertical ones are generally not. Rather than make a change across all variables now, I would prefer to stay with the CMIP5 approach for now, and consider introducing more complete cell methods strings in CMIP7. This would imply using area: sum for volcello and leaving masscello and thkcello as they are.

taylor13 commented 5 years ago

Regarding thkcello, although it is a function of "lev", I think like @matthew-mizielinski that including a cell method for the vertical could be confusing. Can't we leave it out?

Note that when the same method applies to multiple dimensions, it must be independent of the ordering, as stated in the convention doc:

If a data value is representative of variation over a combination of axes, a single method should 
be prefixed by the names of all the dimensions involved (listed in any order, since in this case 
the order must be immaterial). Dimensions should be grouped in this way only if there is an 
essential difference from treating the dimensions individually. 

Again, I find defining a cell_methods for volcello, which is an obvious extensive quantity, unhelpful. The CF default is already "sum", so technically there is no need for cell_methods (except for the time-dimension). I note also that the standard name for volcello is ocean_volume so it seems unnecessary to include "where_sea" in cell_methods. that qualifier doesn't appear in the Ofx table, but it does in other tables where volcello is requested as a variable.

It does seem odd to me also to include the volume associated with volcello, but I guess it isn't harmful or forbidden by the standard.

I'm not particularly concerned about any of this, so not changing masscello or thkcello would be o.k., I guess. If I were making the decision, I would simply omit the cell_method for area in volcello.

martinjuckes commented 5 years ago

Thanks for those comments Karl.

(1) Perhaps including lev: sum can be confusing, but it should be less so in the combination lev: area: sum. It is unambiguous from a technical point of view, because sum just means a sum or integral over the grid cells specified in the vertical coordinate lev and horizontal coordinates, not over the domain.

(2) I agree that the inclusion of cell_methods specifying that volume is a volume integral is a bit odd .. but, on the other hand, we have a previous decision to provide cell methods strings for all variables in order to avoid any requirement for software to make decisions about intensive vs. extensive variables. Adding cell_methods = "lev: area: sum" is a slightly clumsy way of flagging that this is an extensive variable, but it is the generic approach. On the other hand, lev is omitted in the vast majority of variables, so I think it makes sense to be consistent and omit it here.

(3) Yes, cell_measures = "volume: volcello" is redundant at one level ... but if we want this statement included consistently in variables that have this cell volume, it belongs here. For software which does not know the obvious meaning of the standard name it may not be redundant.

If there are no strong objections, I will just correct the volcello cell_methods from area: mean to area: sum for now, and leave the others as they are.

martinjuckes commented 5 years ago

Consistent changes have also been made to masscello and wmo.

taylor13 commented 5 years ago

@martinjuckes We have another report from someone having trouble reading (with iris software) the volcello variable because it contains a reference to itself in cell_measures. Perhaps this is a good reason to go ahead and remove volume: volcello from cell_measures.

I would note that areacella and areacello do not self-reference in the cell_measures.