E3SM-Project / scream

Exascale global atmosphere model written in C++ as part of the E3SM project
https://e3sm-project.github.io/scream/
Other
70 stars 48 forks source link

NetCDF global attributes incomplete #2142

Open ambrad opened 1 year ago

ambrad commented 1 year ago

I was looking at an EAMxx history file and noticed a number of the global attributes are potentially misleading. For example, IC and topo files were used in the simulation, but here they show as "NONE FOR NOW". The case and git_hash attributes would also be useful.

// global attributes:
    :source = "E3SM Atmosphere Model Version 4" ;
    :case = "TEST 1" ;
    :title = "EAMv4 History File" ;
    :git_hash = "THE GIT LOG HASH" ;
    :host = "THE HOST" ;
    :Version = "1.0" ;
    :revision_Id = "None" ;
    :initial_file = "NONE FOR NOW" ;
    :topography_file = "NONE FOR NOW" ;
    :contact = "e3sm-data-support@llnl.gov" ;
    :institution_id = "E3SM-Project" ;
    :product = "model-output" ;
    :realm = "atmos" ;
    :Conventions = "None yet" ;

Aaron Edit: Adding a checklist for all of the attributes we want to add:

PeterCaldwell commented 1 year ago

See the bottom of: https://acme-climate.atlassian.net/wiki/spaces/NGDNA/pages/3290497244/SCREAMv1+Output+Netcdf+Cleanup for a version of the global attributes where I've written how I'd suggest changing them.

Also would be good to add these global attributes:

PeterCaldwell commented 1 year ago

@jgfouca - this "help wanted" task has your name on it (now that I've mentioned you) ;-). You don't need to do it - it isn't urgent - but I think you're probably best equipped for it since it is at the intersection between CIME and SCREAM.

PeterCaldwell commented 1 year ago

@czender wrote on slack: "FYI the last time SCREAM files I looked at had a global attribute Conventions = "None yet" . For EAMxx to leverage some ncclimo capabilities, that attribute value will need to contain a string of the form "CF-1.X", e.g., Conventions = "CF-1.8" . Once CF is listed as a convention, NCO will understand that it should look for and propagate coordinate bounds variables like time_bnds discussed above. It might also help with other software, not sure..."

Thus we should make sure the conventions flag we add in this issue is consistent with some of the work we need to do in other output git issues...

rljacob commented 1 year ago

"EAMv4 History File" should be "EAMxx History File" "E3SM Atmosphere Model Version 4" should be something like "E3SM C++ Atmosphere Model"

czender commented 1 year ago

The EAMxx output files already implement enough CF Conventions (units, calendar, standard_name, etc) to merit adding a Conventions = "CF-1.0" attribute. I do not think bounds were included in that version. Later, when EAMxx has as much CF metadata as EAM, you can bump the version to 1.7 or higher. I cannot think of any downside to havine CF-1.0 in Conventions right now, and it would help folks like @golaz who are trying EAM workflow infrastructure right now on (heavily massaged) EAMxx datasets.

PeterCaldwell commented 1 year ago

Two more attributes to add:

  1. averaging type
  2. output frequency
bartgol commented 1 year ago

Two more attributes to add:

  1. averaging type
  2. output frequency

Given that these are the same for all vars, do you want them added as a global attribute, or do you want to follow CF convention and add them to all vars?

PeterCaldwell commented 1 year ago

Given that these are the same for all vars, do you want them added as a global attribute, or do you want to follow CF convention and add them to all vars?

Oh, I definitely meant they should be a global attribute rather than repeated.

golaz commented 1 year ago

Recommend checking with @czender to make sure we follow CF conventions.

czender commented 1 year ago

CF does not have a commonly used convention, AFAIK, to indicate output frequency. The time coordinate contains this information implicitly. Any aptly named global attribute would be fine. Averaging types, on the other hand, are covered by the Cell Methods convention. Implement an attribute named cell_methods with value time: mean for time-averaged output, or time: point for instantaneous output. CF prefers that the cell_methods attribute be per-variable rather than global, because variables are often subsetted from their parent files and only the per-variable attributes will be carried with the variable. NB: EAM and ELM do not (yet) implement cell_methods. The post-processing software (e.g., ncclimo) currently does this for EAM and ELM.

czender commented 1 year ago

I was mistaken: EAMv2 and ELMv2 both implement CF-compliant per-variable cell_methods in the raw data.

rljacob commented 1 year ago

I think CMIP has a frequency variable. We have them in EAM history files. Some examples: h0: :time_period_freq = "month_1" h1: :time_period_freq = "day_1" ; h2: :time_period_freq = "hour_6" ; h3: :time_period_freq = "hour_6" ; h4: :time_period_freq = "hour_3" ; h5: :time_period_freq = "day_1" ; h6: :time_period_freq = "month_1" ;

rljacob commented 1 year ago

Averaging type is per-variable. So a "6 hourly" file can have some 6 hour-average fields and 6-hour instantaneous fields.

bartgol commented 1 year ago

Averaging type is per-variable. So a "6 hourly" file can have some 6 hour-average fields and 6-hour instantaneous fields.

That is not be supported in EAMxx, and it was my understanding from slack convos that mixing INSTANT and AVERAGE fields in the same file was confusing and/or not desired anymore. Is that not the case?

rljacob commented 1 year ago

Its fine to impose that files have one type but it should still be a variable attribute because variables can be pulled out on their own and may or may not take global attributes with them.

PeterCaldwell commented 1 year ago

Its fine to impose that files have one type but it should still be a variable attribute because variables can be pulled out on their own and may or may not take global attributes with them.

Is "variable X was well documented but was carelessly moved to its own file and lost its global metadata" really a realistic use case that we should protect against? It seems like the person who extracted the variable had the responsibility to capture its metadata. I don't think this is a big deal either way, but I'd rather not make our files bloated if it's all the same.

rljacob commented 1 year ago

Looking at some EAM files, I don't see instantaneous and average values mixed. But I do see some min and max values in with inst or average.

    float TREFHT(time, ncol) ;
        TREFHT:units = "K" ;
        TREFHT:long_name = "Reference height temperature" ;
        TREFHT:standard_name = "air_temperature" ;
        TREFHT:cell_methods = "time: mean" ;
    float TREFHTMN(time, ncol) ;
        TREFHTMN:units = "K" ;
        TREFHTMN:long_name = "Minimum reference height temperature over output period" ;
        TREFHTMN:cell_methods = "time: minimum" ;
    float TREFHTMX(time, ncol) ;
        TREFHTMX:units = "K" ;
        TREFHTMX:long_name = "Maximum reference height temperature over output period" ;
        TREFHTMX:cell_methods = "time: maximum" ;

An instantaneous value has the attribute :cell_methods = "time: point".

PeterCaldwell commented 1 year ago

Yeah, EAMxx changed that paradigm. It's now impossible to have an output file that has both min and max (because averaging method is part of the file name, among other reasons). In one sense this is a step backwards - it's nice to do whatever you want! - but it makes our code easier and since there's no limit on how many output streams one can generate, it's not such a big deal.

You used to be able to change the averaging type by specifying :A, :I, etc after a particular variable. So you could have one file with a variable of all 4 averaging types. No longer!

rljacob commented 1 year ago

It makes sense to move to fewer variables per file since you're running at gigantic resolutions. But you should still put the averaging method in the variable attribute because it's a CF convention (not required I think but encouraged).

bartgol commented 1 year ago

It makes sense to move to fewer variables per file since you're running at gigantic resolutions. But you should still put the averaging method in the variable attribute because it's a CF convention (not required I think but encouraged).

Yeah, we can definitely do that. It has zero implications on any other internal mechanism, so I can add it.

PeterCaldwell commented 1 year ago

Ok, I also agree we should do this if it is the CF convention.

AaronDonahue commented 2 months ago

@bartgol , looking through old issues I came across this one. I think we have nearly all of these attributes already in place, see the checklist at the top. Do you think we could add the remaining ones in #2802 ?

For the group who've commented (@ambrad , @PeterCaldwell , @rljacob , @golaz , @czender , @bartgol ) are we happy with what we have? Are there any of the unchecked attributes that we should prioritize putting in?

Ultimately, can we close this issue?

bartgol commented 2 months ago

@AaronDonahue I am for doing things separately.