datamade / django-councilmatic

:heartpulse: Django app providing core functions for *.councilmatic.org
http://councilmatic.org
MIT License
26 stars 16 forks source link

Add a mechanism to account for "old" downloads #172

Closed reginafcompton closed 1 year ago

reginafcompton commented 6 years ago

Currently, django-councilmatic preserves all downloads unless (1) the bill has changed, (2) you run import with the --delete option, or (3) we deploy the app and in turn delete the downloads folder.

This causes some problems. The import command inserts all downloads into the raw tables (regardless of the updated_at timestamp)...for example. Thus, with 36000 downloaded bills, but only (say) 900 that require updates, the import has to do a lot of unnecessary work.

I suggest that we check the updated_at field in JSON before adding it to a raw table, something like: if updated_at (from JSON) >= update_since (our boundary for knowing what's new), then add this data to the raw table.

@evz - additional thoughts on this?