edgi-govdata-archiving / web-monitoring-db

An HTTP API for tracking and annotating changes to a set of web pages.
https://api.monitoring.envirodatagov.org/
GNU General Public License v3.0
17 stars 26 forks source link

Disable auto-analysis of new versions #1090

Closed Mr0grog closed 1 year ago

Mr0grog commented 1 year ago

A big goal from long ago was to provide useful automated analyses of changes to help analysts and others prioritize what to look at. We originally did that with AnalyzeChangeJob, which is scheduled automatically after every import and create annotation records for every consecutive pair of versions: https://github.com/edgi-govdata-archiving/web-monitoring-db/blob/f01ad684c5c4df2775d1ca12e3a37dd76e400794/app/jobs/import_versions_job.rb#L26-L36

However, the analysis provided by that job always fell short (for a variety of reasons), and the problems it was meant to solve were ultimately better handled by web-monitoring-task-sheets. AnalyzeChangeJob is still running after every import, though! It’s doing a lot of work and generating a lot of data nobody never uses, so it’s probably time to disable it.

(NB: in an ideal world, we’d have better integrated the task sheets stuff so it ran automatically and posted results as annotations available through the API here, but that never happened for a whole other set of reasons. At this point, we are putting the project to rest, and it no longer makes sense to try and better integrate these tools.)

Mr0grog commented 1 year ago

This should actually have been an ops ticket (I thought about it, and the best change was just to remove the AUTO_ANNOTATION_USER config variable rather than change code here). 🤷

At any rate, this is done in commit 6e61be326d91ff08edd22d8bf4cfc36a6844a5cd of ops-internal (not stored on GitHub, since it lists secrets). I’m also going to add a note in the source code for AnalyzeChangeJob.