ESMValGroup / ESMValTool

ESMValTool: A community diagnostic and performance metrics tool for routine evaluation of Earth system models in CMIP
https://www.esmvaltool.org
Apache License 2.0
226 stars 128 forks source link

Should ESMvalTool produce a "tracking_id"? #3038

Open glpotter opened 1 year ago

glpotter commented 1 year ago

According the the ESGF https://esgf.github.io/esg-search/Climate_Model_Metadata.html the "tracking_id" is required metadata. The cmorizers I have been working with don't produce it. Is this by choice or is it an oversight?

glpotter commented 1 year ago

@valeriupredoi and @zklaus I submitted this a coupe of days ago

zklaus commented 1 year ago

Thanks for this, @glpotter. Note that the document linked above speaks specifically about CMIP5 and also a possible future CMIP6, so we should check with ESGF on how the requirements have developed. I think part of that will be a project-specific definition. Are we mostly targeting Obs4mips here?

valeriupredoi commented 1 year ago

gotcha @glpotter :+1: In light of what we talked yesterday on the call, plus adding to what @zklaus mentions above, would you mind please adding all the other attributes you'd be interested to see, so I can run them by Matt Miezilinski first, please? :beer:

bouweandela commented 1 year ago

Is there an issuing authority for those tracking IDs? Or can we just make up our own, e.g. using uuid.uuid4()?

zklaus commented 1 year ago

This has somewhat changed over time. CORDEX, CMIP5, and CMIP6 use uuids that we can produce with the uuid module, but CMIP6 also uses them as part of a HDL that is registered at DKRZ. Matt Miezilinski will be able to tell us whether they can/should be produced by the data producers, i.e. us, or are assigned as part of the publication process on ESGF.

valeriupredoi commented 1 year ago

To keep things digestible for Matt, given how busy he is, I'd rather we summarized the tracking_id in one line, plus add the other questions for the other attrs that may be needed - that's why I'd appreciate the input from @glpotter here - if there were other things to discuss, pop them in here pls :beer:

glpotter commented 1 year ago

I’ll work on this tonight. Jerry

glpotter commented 1 year ago

@valeriupredoi, @zklaus @bouweandela, Before NASA GSFC abandoned CMOR we included the mandatory Metadata fields only. tracking_id was one of them. Since ESMValTool is preparing data to eventually be put into ESGF, only the mandatory fields need be present. This list is a little different from what ESGF wants. The mandatory list is in https://esgf.github.io/esg-search/Climate_Model_Metadata.html. At least the MERRA2 list is: :Contact = "http://gmao.gsfc.nasa.gov" ; :Institution = "NASA Global Modeling and Assimilation Office" ; :Title = "MERRA2 inst6_3d_ana_Np: 3d,6-Hourly,Instantaneous,Pressure-Level,Analysis,Analyzed Meteorological Fields Monthly Mean" ; :VersionID = "5.12.4" ; :comment = "\'Contains modified MERRA-2 data\'\n" ; :history = "Created on 2023-02-04 00:43:14" ; :host = "unknown host" ; :mip = "Amon" ; :modeling_realm = "reanaly" ; :project_id = "OBS6" ; :raw = "H" ; :reference = "doi:10.1175/jcli-d-16-0758.1" ; :source = "" ; :tier = "3" ; :title = "MERRA2 data reformatted for ESMValTool v2.8.0.dev45+g470d12940" ; :user = "potter" ; :version = "5.12.4" ; :Conventions = "CF-1.7" ;

For tracking_id The Persistent identifier scan be resolved at http://hdl.handle.net/

if that helps..

glpotter commented 1 year ago

I just heard from a researcher at NASA GISS who had to create new tracking ID for CMIP6 he said:

When I re-processed the GISS model output, I had to create new tracking IDs for the updated files. To do this, I created unique uuid using uuidgen (https://www.uuidgenerator.net/). If you are a Python user, you can also do this via the "uuid" library (https://docs.python.org/3/library/uuid.html).

One note—our CMIP6 metadata has the prefix "hdl:21.14100/", so I had to ensure I included this before the uuid when adding the "tracking_id" using the ncatted attribute editor tool (https://linux.die.net/man/1/ncatted).

zklaus commented 1 year ago

That's good, but the point of the hdl (aka handle) system is that you can look it up at https://hdl.handle.net/. For CMIP6 this was accomplished via registration at the handle server operated by DKRZ, but if we just add our own uuid's with no handle server to back them up, we just generate lookup errors. Of course, the id would still be useful to identify files, but if we can integrate them into the graph of PIDs, that would be even better and if they can't be looked up on handle, we shouldn't call them hdl.