catalyst-cooperative / pudl

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
https://catalyst.coop/pudl
MIT License
471 stars 108 forks source link

Create a sensor that checks foreign keys when a `pudl.etl` job succeeds #2468

Open bendnorman opened 1 year ago

bendnorman commented 1 year ago

Foreign keys are checked in the CI and nightly builds but are not automatically checked when developing locally; you must remember to run the pudl_check_fks cli command. This isn't ideal because if you forget to run pudl_check_fks after making code changes that break fk constraints, you won't know until the CI fails (currently about 100 minutes).

A potential solution here is to setup a sensor that kicks off a job that checks the foreign keys when a pudl.etl job succeeds. See Run status sensors.

Note: The FK's will automatically be checked in a separate job but I don't know if the UI will make it clear to users the job passed or failed. I'm worried a fk check will silently be kicked off in the background but users will need to check the Runs tab to see if the FK check passed or failed.

In Scope

- [ ] Create a job that runs `SQLiteIOManager.check_foreign_keys()`
- [ ] Setup a sensor that runs the foreign key job upon a successful `pudl.etl` job.
jdangerx commented 1 year ago

I wonder if we can use the graph-nesting stuff, also - that might help with the "invisible run" stuff though I'm not sure about mixing graphs & assets...