Closed zschira closed 1 year ago
OK, so my understanding of this is that when we run pudl_archiver eia860
:
If so, then what we basically need to do is:
archive_dataset
logic into a place that we can access programmatically?)@zschira - is my understanding of the archiver flow correct? And also - does the action plan sound reasonable here?
we create a new major version in the Zenodo concept
The terminology here is a little confusing (and we should probably provide better docs), but the concept DOI will always point to the latest version of a dataset, while a deposition refers to a single version.
the new version already has everything from the old version, so we look at the files in the old version and compare with our freshly-downloaded set:
- anything deleted? delete it
- anything added? add it
- anything changed via checksum? update it
then, if nothing changed, we abandon the update (do we need to discard the draft somehow?) - otherwise, we tell Zenodo to actually publish the new version
This is correct. I think ideally we would discard the draft, however I've found the zenodo api to act unexpectedly when trying to do that, so instead we just reuse this draft during the next run.
@zschira commented on Tue Sep 13 2022
Once the archiver/scraper repos have been combined, and we have high level scripts for managing the process, it should be very easy to create github actions for automating the archiving process. New data is released at various frequencies for the different data sources incorporated in PUDL, so we can create multiple actions that run at frequencies reflective of this.
@zaneselvans commented on Tue Sep 13 2022
I am so excited for this to finally happen!