Open sashakames opened 4 years ago
@sashakames I think the way that @doutriaux1 did this in the past was to compare the hash of the table file used with the version they identified they used. If the hash wasn't identical, then report a "user error"
@sashakames @durack1 is right. In the tables repo I stored the md5 of the tables see: https://github.com/PCMDI/cmip5-cmor-tables/blob/master/Lib/gen_table_md5s.py
While we're on this subject, for the CMIP6_CVs we embed some version information in the files, so e.g.:
"version_metadata":{
"CV_collection_modified":"Wed Jan 15 16:44:36 2020 -0800",
"CV_collection_version":"6.2.45.7",
"author":"Paul J. Durack <durack1@llnl.gov>",
"institution_id":"PCMDI",
"mip_era_CV_modified":"Thu Aug 25 17:21:00 2016 -0700",
"mip_era_CV_note":"Fix #36 - Add CV name to json structure",
"previous_commit":"f07f21c4a1fa5b87a39d1fa51106fd7fad1583b8",
"specs_doc":"v6.2.7 (10th September 2018; https://goo.gl/v1drZl)"
https://github.com/WCRP-CMIP/CMIP6_CVs/blob/master/mip_era.json#L9-L17
I note this is a little fiddly, as you can't embed the commit hash in the file that corresponds with that particular version as the change to the hash means you're always one step behind yourself, but this could be implemented external to the table files just like @doutriaux1 did for cmip5.
Reasonable solution and PrePARE should do so too, and we check that CMOR still will enforce this as it must have been disabled, unless NUIST created their own hashes to circumvent the check.
@sashakames @doutriaux1 it would seem the CMOR already provides this info as netcdf file metadata, see:
:table_info = "Creation Date:(09 May 2019) MD5:fa1fbd5904d4b890fb319f821b9ab99a"
We'd just need to validate this is an "official" hash/date pair
yep that's what I implemented for CMIP5. I think there's even tools for it (attributes name may have changed in CMOR3), but essentially go fetch md5s file from github directly and validate it matches.
I double checked and the hash is there in the file that should be rejected!
Sorry for all the edits. Clearly there must be a bug somewhere that the check isn't enforced even though the hash gets written out.
no we never checked the hash! Sadly. I added the tag in CMOR2 with the hope that people would use to check. I created the tools in the cmip5-cmor-tables repo to check if the hash matches. But nobody ever cared to use this in production...
@sashakames just circling around on some of these issues, there has been conversations about creating conda packages for the cmip6-cmor-tables
, if we did that, it would facilitate the automated testing far more easily - it would also allow us to embed md5 hashes in the package tar ball
@durack1 That sounds promising, so the embedded tables in the conda package would be much more difficult to tamper with. You just need to engineer the conda build so its not simple enough to download from source and install yourself.
@matthew-mizielinski thinking that this issue should be moved across to the https://github.com/PCMDI/mip-cmor-tables/ repo?
A very good idea for CMOR4 planning, along with "shipping" CMOR with the mip-cmor-tables - somewhat of a dupe for #676
Example file: CMIP6/PMIP/NUIST/NESM3/lig127k/r1i1p1f1/Ofx/areacelli/gn/v20190924/areacelli_Ofx_NESM3_lig127k_r1i1p1f1_gn.nc CMOR should not produce this because it doesn't match the tables.
The file will pass PrePARE if I add the same made-up variable with the "sea" realm.
Possibilities: (1) CMOR/PrePARE only uses source repos with "tag" checkouts without modified files (2) gets Table data from a server.