catalyst-cooperative / pudl

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
https://catalyst.coop/pudl
MIT License
456 stars 106 forks source link

Have nightly builds output cache keys when successful #2461

Open zaneselvans opened 1 year ago

zaneselvans commented 1 year ago

We have several repositories that download the PUDL DB from the nightly build outputs for use in CI. Currently they have no way of knowing whether they actually need to download the DB. We can pull from the AWS Open Data buckets and that makes it free, but it would be faster just to not download if we don't need to (especially as the DB grows).

If the nightly builds generated a couple of caching keys alongside the DB this would be easy to do. Maybe just a couple of text files or a yaml file containing:

This could be done by adding some shell commands to our gcp_pudl_etl.sh script that's run in the Docker container for the nightly builds (and maybe also the local_pudl_etl.sh script for testing / development).

Then we would need to modify the caching step in the tox-pytest workflows in the repositories that download the nightly PUDL DB outputs to use these caching keys to determine whether a new DB should be downloaded. Right now these repos include at least:

ggurjar333 commented 8 months ago

Can I TAKE this?