JustFixNYC / nycdb-k8s-loader

Loading and updating of NYC-DB data via containerized batch processing.
6 stars 2 forks source link

replicate dbhash url tracker for dataset update date tracker #159

Closed austensen closed 2 months ago

austensen commented 2 months ago

We already have a setup for creating a table that tracks metadata from the URLs of datasets we're downloading to prevent updating when nothing has changed from the source. It works with the dbhash class to create the table (in a nycdb_k8s_loader schema) and manage changes to it, it follows a simple structure with key and value string columns. And then another class (for urls it's lastmod and for this new one it's dataset_tracker) to do the updating of those records. For the key we're using the dataset name, so for nycdb that's just one for all tables it covers (eg. acris for all the acris tables). And for the value it's a timestamp in local nyc timezone but stored as a string in standard iso format. Because the setup was all built out already for the two text columns for now I've just kept that and we can really easily just cast the text to date when querying it and that seemed easier for now rather than changing things a bunch to use a date column.

Once we have this new dataset_tracker table added to the db we can add an api endpoint to wow for checking the date each dataset was last updated on our db.

[sc-14791]