datamade / django-councilmatic

:heartpulse: Django app providing core functions for *.councilmatic.org
http://councilmatic.org
MIT License
26 stars 16 forks source link

Data integrity #179

Closed reginafcompton closed 6 years ago

reginafcompton commented 6 years ago

This PR adds a management command which verifies that Solr indexed all Councilmatic bills (no more, no less).

We will use it with Metro and implement elsewhere, if desired. Relates to this Metro issue: https://github.com/datamade/la-metro-councilmatic/issues/256

reginafcompton commented 6 years ago

@evz - I think that's an important question....the answer, in part: human error. Some of the non-indexed Metro bills were created mid-January (13, 14, 15), around the time that we had the "no space left" errors on the Councilmatic server, e.g., https://sentry.io/datamade/lametro-councilmatic/issues/434399533/ For Metro, import_data failed, and I recall having to manually run import_data to get things up-to-speed. This was also around the time that I isolated the various commands into distinct cron tasks. I suspect that I forgot to run update_index, and those bills simply never got indexed. (I am not certain if this explanation accounts for all missing bills, however.)

The good news about this script: it can help us better evaluate when the data numbers become misaligned.