Closed github-actions[bot] closed 3 months ago
It seems a little fishy to me that the ferc1
took 2 hours, but ferc2
only took 3 minutes, given that their archives should end up being about the same size, and almost all of the ferc2
files got updated.
Still working my way through the archives, I'll take a look.
Everything has been inspected and published.
Would it be easy to automate checking for the kind of failed upload that CEMS experienced this time around? Like check that all the files in the datapackage area actually in the draft deposition and have the same checksum?
The datapackage and checksums are produced at the end from the files uploaded, so I'm not exactly sure what you're proposing? We already check file size against the last upload. This seems to be some kind of problem with the way that 502 errors are getting retried.
I was imagining that we could calculate the file size and/or checksums locally, and compare to the file sizes and/or checksums that are reported on Zenodo, and if they don't match, raise an error.
Are you saying that the filesizes & checksums that end up in the datapackage.json are being populated based on the information on Zenodo, rather than the local files?
Ah yes, that would be a pretty straightforward validation! I'll write up an issue
Summary of results:
See the job run logs and results here. Second run of CEMS and NREL ATB data here.
Review and publish archives
For each of the following archives, find the run status in the Github archiver run. If validation tests pass, manually review the archive and publish. If no changes detected, delete the draft. If changes are detected, manually review the archive following the guidelines in step 3 of
README.md
, then publish the new version. Then check the box here to confirm publication status, adding a note on the status (e.g., "v1 published", "no changes detected, draft deleted"):Validation failures
For each run that failed because of validation test failures (seen in the GHA logs), add it to the tasklist. Download the run summary JSON by going into the "Upload run summaries" tab of the GHA run for each dataset, and follow the link. Investigate the validation failure.
If the validation failure is deemed ok after manual review (e.g., Q2 of 2024 data doubles the size of a file that only had Q1 data previously, but the new data looks as expected), go ahead and approve the archive and leave a note explaining your decision in the task list.
If the validation failure is blocking (e.g., file format incorrect, whole dataset changes size by 200%), make an issue to resolve it.
Other failures
For each run that failed because of another reason (e.g., underlying data changes, code failures), create an issue describing the failure and take necessary steps to resolve it.