As mentioned in issue #1, Socrata has a metadata API that can be used to check when a table has been updated, and it also returns a lot of information about the table, including the description of the data set, names/data types/descriptions for all columns, links to the hosting domain, and more.
I want to implement ELT DAGs (extracting from Socrata sources) to check the metadata API, extract the dataUpdatedAt timestamp, compare that against prior pulls, and then pull only if dataUpdatedAt is greater than the dataUpdatedAt value for the most recent table pull where data was successfully pulled and new/updated records were successfully ingested.
I'm not sure if I want to put all metadata into one table to create several metadata tables, but in any case, this will need at least columns for
an identifier for the data table (ie its Socrata table_id),
the name of that table in the local database,
the last time the data was updated, and
the last time data was successfully pulled and new/updated records were successfully ingested.
As mentioned in issue #1, Socrata has a metadata API that can be used to check when a table has been updated, and it also returns a lot of information about the table, including the description of the data set, names/data types/descriptions for all columns, links to the hosting domain, and more.
I want to implement ELT DAGs (extracting from Socrata sources) to check the metadata API, extract the
dataUpdatedAt
timestamp, compare that against prior pulls, and then pull only ifdataUpdatedAt
is greater than thedataUpdatedAt
value for the most recent table pull where data was successfully pulled and new/updated records were successfully ingested.I'm not sure if I want to put all metadata into one table to create several metadata tables, but in any case, this will need at least columns for
table_id
),