cchdo / hydro

The big ol CCHDO netCDF-CF project
https://hydro.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
1 stars 2 forks source link

Extended Attributes at variable level: `processing_level`, `comment`, `creator_name`, `project`, `date_modified`, `date_metadata_modified` #40

Open DocOtak opened 3 weeks ago

DocOtak commented 3 weeks ago

netCDF allows for a lot more information than exists in exchange files, with the CCHDO documentation metadata extraction project going, eventually we will need a place to put that metadata. For a while now, I have wanted to store the information contained in the "Bob Headers" in a more structured way. The following ACDD attributes, when pushed down from the global to variable level, should enable the creation of "Bob Headers": processing_level, comment, creator_name, project, date_modified, date_metadata_modified. Further examination of each of these:

The only non standard ACDD usage of the above are being at the variable level rather than global, and the possible use of arrays of strings. We could define combining rules to put all this information in the global attributes that fully conform to ACDD, but this would likely be one way (update the globals from variables, not the other way around). For example: the global date_modified would be set to the most recent date seen from all the variables that also have date_modified.

Things this might make possible:

DocOtak commented 3 weeks ago

Some resources for processing_level: https://wiki.pangaea.de/wiki/Processing_levels only the Argo data status and GCMD "Levels" are in NVS.

DocOtak commented 3 weeks ago

More resources, we should absolutely support ORCiDs in the creator URL, we could then populate or validate against the ORCiD record using content negotiation to get a json-ld schema.org/Person record: see https://github.com/ORCID/ORCID-Source/blob/main/CONTENT_NEGOTIATION.md

cberys commented 2 days ago

We could define combining rules to put all this information in the global attributes that fully conform to ACDD, but this would likely be one way (update the globals from variables, not the other way around). For example: the global date_modified would be set to the most recent date seen from all the variables that also have date_modified.

Minor comment that the date_metadata_modified might be modified later in time at the global level than at the variable level - ie that one might not be just one-way variable to global.

Savannah-Lewis commented 17 hours ago

Processing level: I love having this, since this is a question I’ve had to field several times already. Assuming our submission doesn’t change, would we automatically assume everything is preliminary unless otherwise stated? Any reason not to implement the PANGAEA (or other existing processing levels such as EOSDIS) schema rather than retaining our own? Creator_url: I would definitely be in favor of adding this attribute to store ORCiDs. Anecdotally, I noticed that BCO-DMO requires data submitters to login via ORCid. Project: This seems really important, but also already a cluster in GO-SHIP i.e. discussed with Alison today our definition of noble gases and how that changes based on what the PI measures i.e. sometimes only He and Tr, other times includes more including Ar, Ne, etc. Would be a good idea to get started on piecing these things apart now and I think it will be helpful in a multitude of ways, not just our metadata model! The date_modified and date_metadata_modified seem super useful. I think adding the ‘lineage statement’ might be helpful as well as a way to store any doi’s that are submitted with the dataset.