MattTriano / analytics_data_where_house

An analytics engineering sandbox focusing on real estates prices in Cook County, IL
https://docs.analytics-data-where-house.dev/
GNU Affero General Public License v3.0
9 stars 0 forks source link

Create a database table to track table metadata #2

Closed MattTriano closed 1 year ago

MattTriano commented 1 year ago

As mentioned in issue #1, Socrata has a metadata API that can be used to check when a table has been updated, and it also returns a lot of information about the table, including the description of the data set, names/data types/descriptions for all columns, links to the hosting domain, and more.

I want to implement ELT DAGs (extracting from Socrata sources) to check the metadata API, extract the dataUpdatedAt timestamp, compare that against prior pulls, and then pull only if dataUpdatedAt is greater than the dataUpdatedAt value for the most recent table pull where data was successfully pulled and new/updated records were successfully ingested.

I'm not sure if I want to put all metadata into one table to create several metadata tables, but in any case, this will need at least columns for

MattTriano commented 1 year ago

Implemented.