Our historical values asset completed its automated refresh on Sept 1, 2023, but for some reason only had ~9 million rather than ~ 40 million rows. We were not notified of a failure by email. This needs investigating.
UPDATE 10/5/2023 (from Dan)
We're hearing from some folks that the Socrata agent may be failing silently in some cases, leading to the Open Data portal being incomplete or not up-to-date. Since the Open Data portal is a critical asset, we should build checks to guard against such failures.
I propose that we create a short Python script that pulls the row count by year from each Open Data asset and compares it to the count of the source Athena view. Row counts can be gathered directly from the Socrata SoQL API, e.g.:
We should run the script on a schedule using GitHub Actions, and use the same notification setup used in test-dbt-models.
Since current year row counts may not match the Open Data Portal exactly, we should use a "fuzzy" match for the most recent 2 years (+/- 2%) and an exact match for all years prior to that.
Our historical values asset completed its automated refresh on Sept 1, 2023, but for some reason only had ~9 million rather than ~ 40 million rows. We were not notified of a failure by email. This needs investigating.
UPDATE 10/5/2023 (from Dan)
We're hearing from some folks that the Socrata agent may be failing silently in some cases, leading to the Open Data portal being incomplete or not up-to-date. Since the Open Data portal is a critical asset, we should build checks to guard against such failures.
I propose that we create a short Python script that pulls the row count by year from each Open Data asset and compares it to the count of the source Athena view. Row counts can be gathered directly from the Socrata SoQL API, e.g.:
We should run the script on a schedule using GitHub Actions, and use the same notification setup used in
test-dbt-models
.Since current year row counts may not match the Open Data Portal exactly, we should use a "fuzzy" match for the most recent 2 years (+/- 2%) and an exact match for all years prior to that.