catalyst-cooperative / pudl

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
https://catalyst.coop/pudl
MIT License
456 stars 106 forks source link

Create a github action to update and sync the datastore caches #1895

Closed bendnorman closed 1 year ago

bendnorman commented 1 year ago

Create a github action that runs the datastore script to update the GCS cache if any new DOIs have been added to the code. The action should also sync the internal-zenodo-cache and zenodo-cache buckets.

@zaneselvans should this action checkout dev or main for the updated DOIs in the datastore script?

https://github.com/catalyst-cooperative/pudl/blob/d294cb52b62b6adf18384b2bbb1be15c84667e6f/src/pudl/workspace/datastore.py#L140-L159

I think dev so nightly builds won't have to populate the GCS cache.

TODOs

zaneselvans commented 1 year ago

I think it has to check out dev otherwise it'll never update anything that's needed for the nightly builds, which run on dev. Right?

bendnorman commented 1 year ago

Groovy, just wanted to double-check.

zaneselvans commented 1 year ago

It's a whole separate versioning regime really. At some point I think we should have 2 actions running nightly:

Then we'll have:

bendnorman commented 1 year ago

Love this idea. Once we get more familiar with Dagster Cloud we could also take advantage of sensors to run both of these workflows.