Open ambrad opened 1 year ago
See the bottom of: https://acme-climate.atlassian.net/wiki/spaces/NGDNA/pages/3290497244/SCREAMv1+Output+Netcdf+Cleanup for a version of the global attributes where I've written how I'd suggest changing them.
Also would be good to add these global attributes:
@jgfouca - this "help wanted" task has your name on it (now that I've mentioned you) ;-). You don't need to do it - it isn't urgent - but I think you're probably best equipped for it since it is at the intersection between CIME and SCREAM.
@czender wrote on slack: "FYI the last time SCREAM files I looked at had a global attribute Conventions = "None yet" . For EAMxx to leverage some ncclimo capabilities, that attribute value will need to contain a string of the form "CF-1.X", e.g., Conventions = "CF-1.8" . Once CF is listed as a convention, NCO will understand that it should look for and propagate coordinate bounds variables like time_bnds discussed above. It might also help with other software, not sure..."
Thus we should make sure the conventions flag we add in this issue is consistent with some of the work we need to do in other output git issues...
"EAMv4 History File" should be "EAMxx History File" "E3SM Atmosphere Model Version 4" should be something like "E3SM C++ Atmosphere Model"
The EAMxx output files already implement enough CF Conventions (units, calendar, standard_name, etc) to merit adding a Conventions = "CF-1.0" attribute. I do not think bounds were included in that version. Later, when EAMxx has as much CF metadata as EAM, you can bump the version to 1.7 or higher. I cannot think of any downside to havine CF-1.0 in Conventions right now, and it would help folks like @golaz who are trying EAM workflow infrastructure right now on (heavily massaged) EAMxx datasets.
Two more attributes to add:
Two more attributes to add:
- averaging type
- output frequency
Given that these are the same for all vars, do you want them added as a global attribute, or do you want to follow CF convention and add them to all vars?
Given that these are the same for all vars, do you want them added as a global attribute, or do you want to follow CF convention and add them to all vars?
Oh, I definitely meant they should be a global attribute rather than repeated.
Recommend checking with @czender to make sure we follow CF conventions.
CF does not have a commonly used convention, AFAIK, to indicate output frequency. The time coordinate contains this information implicitly. Any aptly named global attribute would be fine. Averaging types, on the other hand, are covered by the Cell Methods convention. Implement an attribute named cell_methods
with value time: mean
for time-averaged output, or time: point
for instantaneous output. CF prefers that the cell_methods
attribute be per-variable rather than global, because variables are often subsetted from their parent files and only the per-variable attributes will be carried with the variable. NB: EAM and ELM do not (yet) implement cell_methods
. The post-processing software (e.g., ncclimo
) currently does this for EAM and ELM.
I was mistaken: EAMv2 and ELMv2 both implement CF-compliant per-variable cell_methods
in the raw data.
I think CMIP has a frequency variable. We have them in EAM history files. Some examples: h0: :time_period_freq = "month_1" h1: :time_period_freq = "day_1" ; h2: :time_period_freq = "hour_6" ; h3: :time_period_freq = "hour_6" ; h4: :time_period_freq = "hour_3" ; h5: :time_period_freq = "day_1" ; h6: :time_period_freq = "month_1" ;
Averaging type is per-variable. So a "6 hourly" file can have some 6 hour-average fields and 6-hour instantaneous fields.
Averaging type is per-variable. So a "6 hourly" file can have some 6 hour-average fields and 6-hour instantaneous fields.
That is not be supported in EAMxx, and it was my understanding from slack convos that mixing INSTANT and AVERAGE fields in the same file was confusing and/or not desired anymore. Is that not the case?
Its fine to impose that files have one type but it should still be a variable attribute because variables can be pulled out on their own and may or may not take global attributes with them.
Its fine to impose that files have one type but it should still be a variable attribute because variables can be pulled out on their own and may or may not take global attributes with them.
Is "variable X was well documented but was carelessly moved to its own file and lost its global metadata" really a realistic use case that we should protect against? It seems like the person who extracted the variable had the responsibility to capture its metadata. I don't think this is a big deal either way, but I'd rather not make our files bloated if it's all the same.
Looking at some EAM files, I don't see instantaneous and average values mixed. But I do see some min and max values in with inst or average.
float TREFHT(time, ncol) ;
TREFHT:units = "K" ;
TREFHT:long_name = "Reference height temperature" ;
TREFHT:standard_name = "air_temperature" ;
TREFHT:cell_methods = "time: mean" ;
float TREFHTMN(time, ncol) ;
TREFHTMN:units = "K" ;
TREFHTMN:long_name = "Minimum reference height temperature over output period" ;
TREFHTMN:cell_methods = "time: minimum" ;
float TREFHTMX(time, ncol) ;
TREFHTMX:units = "K" ;
TREFHTMX:long_name = "Maximum reference height temperature over output period" ;
TREFHTMX:cell_methods = "time: maximum" ;
An instantaneous value has the attribute :cell_methods = "time: point".
Yeah, EAMxx changed that paradigm. It's now impossible to have an output file that has both min and max (because averaging method is part of the file name, among other reasons). In one sense this is a step backwards - it's nice to do whatever you want! - but it makes our code easier and since there's no limit on how many output streams one can generate, it's not such a big deal.
You used to be able to change the averaging type by specifying :A
, :I
, etc after a particular variable. So you could have one file with a variable of all 4 averaging types. No longer!
It makes sense to move to fewer variables per file since you're running at gigantic resolutions. But you should still put the averaging method in the variable attribute because it's a CF convention (not required I think but encouraged).
It makes sense to move to fewer variables per file since you're running at gigantic resolutions. But you should still put the averaging method in the variable attribute because it's a CF convention (not required I think but encouraged).
Yeah, we can definitely do that. It has zero implications on any other internal mechanism, so I can add it.
Ok, I also agree we should do this if it is the CF convention.
@bartgol , looking through old issues I came across this one. I think we have nearly all of these attributes already in place, see the checklist at the top. Do you think we could add the remaining ones in #2802 ?
For the group who've commented (@ambrad , @PeterCaldwell , @rljacob , @golaz , @czender , @bartgol ) are we happy with what we have? Are there any of the unchecked attributes that we should prioritize putting in?
Ultimately, can we close this issue?
@AaronDonahue I am for doing things separately.
I was looking at an EAMxx history file and noticed a number of the global attributes are potentially misleading. For example, IC and topo files were used in the simulation, but here they show as "NONE FOR NOW". The case and git_hash attributes would also be useful.
Aaron Edit: Adding a checklist for all of the attributes we want to add:
cell_methods
to each variable)