deployment-gap-model-education-fund / deployment-gap-model

ETL code for the Deployment Gap Model Education Fund
https://www.deploymentgap.fund/
MIT License
6 stars 2 forks source link

Update offshore wind data weekly #354

Closed bendnorman closed 2 months ago

bendnorman commented 3 months ago

We want the Synapse offshore wind data to be updated weekly in BQ. There are a few ways we could do this. For all options, we'll need to archive the data in GCS so we can fall back onto older archives if a change in the data breaks the ETL.

First, we'll need to write a new dgm archiver that pulls the offshore wind table data using pyairtable. This archiver will save the data to gs://dgm-archives each week. (10 hrs)

Option 1: Manual ETL updates

We pull the latest archive version number, add it to the ETL code, open a PR. If the CI passes, merge it in and the changes will be propagated to dev. This is the simplest option but not much automation and we'll probably forget now and then to update the data! I could see us doing this monthly or quarterly. (Probably 30 min per update)

Option 2: Github action

Create a Github action that runs weekly (after the archive) that pulls the latest version for datasets we want to update automatically, create a configuration file with these versions and runs the ETL using the config file. Should we have this run on dev and main? (15 hrs, 0ish per update).

Questions

Do we want to the data to be updated in dev and prod weekly?

TrentonBush commented 2 months ago

Re-scoped this issue to be only about a weekly manual data update. The question of automated updates was moved to #359. This reduced scope issue was closed by #355