MattTriano / analytics_data_where_house

An analytics engineering sandbox focusing on real estates prices in Cook County, IL
https://docs.analytics-data-where-house.dev/
GNU Affero General Public License v3.0
9 stars 0 forks source link

Socrata metadata check fails on data sets that were published and haven't been updated (yet) #78

Closed MattTriano closed 1 year ago

MattTriano commented 1 year ago

This issue emerges when a SocrataTableMetadata instance calls its .check_warehouse_data_freshness(engine) method, specifically when calling check_warehouse_data_freshness with an input value that doesn't match the expected datetime format ("%Y-%m-%dT%H:%M:%S %z"). That function is called with the instance's .latest_data_update_datetime and .latest_metadata_update_datetime attributes, and it seems (per examination of the metadata for the example data set below) that if the data set's data hasn't ever been updated, that field is None in the metadata response.

Editing the logic in check_warehouse_data_freshness() can handle this new understanding. I should also probably open an issue to refactor pytesting into an airflow container rather than in a separate (rather heavyweight) python container.

Example data set: Chicago city boundary