Closed MattTriano closed 1 year ago
About a month ago, I implemented a cleanup DAG that can be manually run to drop all data_raw.temp_*
tables, but I left this issue open as I wasn't sure if it was better to integrate this into the think it's better to integrate this drop into the update_socrata_table task_group (or update_xyz_table task_group if/when I develop connectors to other data sources).
But now I'm pretty confident that I want to leave the cleanup decision to the user, as it's been useful to have a clean pull of the data to check (as opposed to the other table-version in data_raw
, which contains all distinct versions of retrieved records). So I'm going to close this issue out.
The "temp_" tables are useful when developing expectations, but outside of that, they will just be recreated every time fresh data is ingested.
I see two possible courses of action and I'm not sure which I prefer yet.
The former is the ideal long-run solution (i.e., after expectations for the raw data ingestion are developed and mature), but in the short term it would complicate the process of editing a new suite of expectations.