Closed jkrasting closed 2 years ago
@wrongkindofdoctor, @aradhakrishnanGFDL, @Wen-hao-Dong --- any comments or concerns on this proposal?
@jkrasting The proposed rename_coords
flag sounds like a reasonable solution to handling the ocean data at this time
@jkrasting can you provide an example of the coordinate variable renaming in the case that it needs to be skipped with the actual names, assuming it's based on this?
To a POD developer, what is the take away/guidance for this feature and reference?
Sure thing @aradhakrishnanGFDL.
Consider a simple POD that analyzes thetao
and vmo
in native GFDL pp formats. These variables are defined on two different grids. The thetao
variable is defined at the cell centers (yh
,xh
) while vmo
is defined at cell's northern face (yq
,xh
)
float thetao(time, z_l, yh, xh) ;
thetao:long_name = "Sea Water Potential Temperature" ;
thetao:units = "degC" ;
thetao:missing_value = 1.e+20f ;
thetao:_FillValue = 1.e+20f ;
thetao:cell_measures = "volume: volcello" ;
thetao:standard_name = "sea_water_potential_temperature" ;
thetao:cell_methods = "area:mean z_l:mean yh:mean xh:mean time: mean" ;
thetao:time_avg_info = "average_T1,average_T2,average_DT" ;
float vmo(time, z_l, yq, xh) ;
vmo:long_name = "Ocean Mass Y Transport" ;
vmo:units = "kg s-1" ;
vmo:missing_value = 1.e+20f ;
vmo:_FillValue = 1.e+20f ;
vmo:standard_name = "ocean_mass_y_transport" ;
vmo:cell_methods = "z_l:sum yq:point xh:sum time: mean" ;
vmo:time_avg_info = "average_T1,average_T2,average_DT" ;)
The MDTF preprocessor is not flexible enough to fully support multiple grids, so it tries by default to infer a single grid and assign that same grid to every variable. It also renames the coordinate in the process. If you look at yh
and yq
, both variables are considered latitude
by MDTF's inference rules:
double yh(yh) ;
yh:long_name = "h point nominal latitude" ;
yh:units = "degrees_north" ;
yh:axis = "Y" ;
double yq(yq) ;
yq:long_name = "q point nominal latitude" ;
yq:units = "degrees_north" ;
yq:axis = "Y" ;
If a POD attempts to use both these variables simultaneously, the framework will preprocess yh
and yq
and name them both latitude
. This ends up leading to KeyErrors when the latitude is already defined by one version, say yh
, and then is attempted to be overwritten by yq
. It can also lead to one of the variables mistakenly being interpreted on the wrong grid.
A flag already exists to turn off the strict enforcement of one grid and it allows for multiple coordinates to coexist, but there is not a flag to turn off the coordinate renaming. This was probably intended but not fully implemented.
This change mainly impacts how a data center defines their variable conventions (i.e. through the fieldlist*.jsonc
files). In the case of GFDL, this is needed for our post-processed ocean data since we do not include the two-dimensional coordinates (geolat
/geolon
) with each variable for space reasons. Other CF-compliant data (e.g. most CMIP output) poses no issue as the 2-dimensional coords are repeated in every file and have unique names for the different grids (geolat_v
,geolon_v
)
No real change from the POD developer's perspective. It's mainly how the framework interfaces with a source dataset such as GFDL's pp format.
Makes sense, John. No issues. Only minor comment- since time, etc are also coordinate variables, the framework needs to know precisely that this issue/feature request pertains to grid coordinates.
Good point, @aradhakrishnanGFDL. I took this into account via 104ace7. The old renaming rules will still always apply to any diagnostic.VarlistTimeCoordinate
instance.
Ocean model data is often simultaneously analyzed on multiple grids. In the case of MOM6/OM4, this included tracer quantities at the cell centers and transport quantities at the cell edges and corners. Thus, the notion of a single latitude / longitude coordinate system is ill-posed* (see additional context). A flag to preprocess input variables but skip the coordinate renaming step is needed.
The proposed solution is to add an attribute to the
data/fieldlist_*.jsonc
entries to turn off this feature, e.g.:The default behavior would still be to rename the coordinates (i.e.
"rename_coords": true
) if no specification is given.Additional context
xgcm
would be a possible consideration in future versions of the preprocessor