PCMDI / cmor

Climate Model Output Rewriter
BSD 3-Clause "New" or "Revised" License
51 stars 32 forks source link

Resilliency measures to prevent table hacks #575

Open sashakames opened 4 years ago

sashakames commented 4 years ago

Example file: CMIP6/PMIP/NUIST/NESM3/lig127k/r1i1p1f1/Ofx/areacelli/gn/v20190924/areacelli_Ofx_NESM3_lig127k_r1i1p1f1_gn.nc CMOR should not produce this because it doesn't match the tables.

The file will pass PrePARE if I add the same made-up variable with the "sea" realm.

Possibilities: (1) CMOR/PrePARE only uses source repos with "tag" checkouts without modified files (2) gets Table data from a server.

durack1 commented 4 years ago

@sashakames I think the way that @doutriaux1 did this in the past was to compare the hash of the table file used with the version they identified they used. If the hash wasn't identical, then report a "user error"

doutriaux1 commented 4 years ago

@sashakames @durack1 is right. In the tables repo I stored the md5 of the tables see: https://github.com/PCMDI/cmip5-cmor-tables/blob/master/Lib/gen_table_md5s.py

doutriaux1 commented 4 years ago

and https://github.com/PCMDI/cmip5-cmor-tables/blob/master/Tables/md5s

durack1 commented 4 years ago

While we're on this subject, for the CMIP6_CVs we embed some version information in the files, so e.g.:

    "version_metadata":{
        "CV_collection_modified":"Wed Jan 15 16:44:36 2020 -0800",
        "CV_collection_version":"6.2.45.7",
        "author":"Paul J. Durack <durack1@llnl.gov>",
        "institution_id":"PCMDI",
        "mip_era_CV_modified":"Thu Aug 25 17:21:00 2016 -0700",
        "mip_era_CV_note":"Fix #36 - Add CV name to json structure",
        "previous_commit":"f07f21c4a1fa5b87a39d1fa51106fd7fad1583b8",
        "specs_doc":"v6.2.7 (10th September 2018; https://goo.gl/v1drZl)"

https://github.com/WCRP-CMIP/CMIP6_CVs/blob/master/mip_era.json#L9-L17

I note this is a little fiddly, as you can't embed the commit hash in the file that corresponds with that particular version as the change to the hash means you're always one step behind yourself, but this could be implemented external to the table files just like @doutriaux1 did for cmip5.

sashakames commented 4 years ago

Reasonable solution and PrePARE should do so too, and we check that CMOR still will enforce this as it must have been disabled, unless NUIST created their own hashes to circumvent the check.

durack1 commented 4 years ago

@sashakames @doutriaux1 it would seem the CMOR already provides this info as netcdf file metadata, see: :table_info = "Creation Date:(09 May 2019) MD5:fa1fbd5904d4b890fb319f821b9ab99a" We'd just need to validate this is an "official" hash/date pair

doutriaux1 commented 4 years ago

yep that's what I implemented for CMIP5. I think there's even tools for it (attributes name may have changed in CMOR3), but essentially go fetch md5s file from github directly and validate it matches.

sashakames commented 4 years ago

I double checked and the hash is there in the file that should be rejected!

sashakames commented 4 years ago

Sorry for all the edits. Clearly there must be a bug somewhere that the check isn't enforced even though the hash gets written out.

doutriaux1 commented 4 years ago

no we never checked the hash! Sadly. I added the tag in CMOR2 with the hope that people would use to check. I created the tools in the cmip5-cmor-tables repo to check if the hash matches. But nobody ever cared to use this in production...

durack1 commented 2 years ago

@sashakames just circling around on some of these issues, there has been conversations about creating conda packages for the cmip6-cmor-tables, if we did that, it would facilitate the automated testing far more easily - it would also allow us to embed md5 hashes in the package tar ball

sashakames commented 2 years ago

@durack1 That sounds promising, so the embedded tables in the conda package would be much more difficult to tamper with. You just need to engineer the conda build so its not simple enough to download from source and install yourself.

durack1 commented 1 year ago

@matthew-mizielinski thinking that this issue should be moved across to the https://github.com/PCMDI/mip-cmor-tables/ repo?

durack1 commented 6 months ago

A very good idea for CMOR4 planning, along with "shipping" CMOR with the mip-cmor-tables - somewhat of a dupe for #676