edgi-govdata-archiving / web-monitoring-task-sheets

Experimental new tool for generating weekly analyst task sheets for web monitoring
GNU General Public License v3.0
3 stars 0 forks source link

Deprioritize error pages that haven’t changed status code #4

Closed Mr0grog closed 4 years ago

Mr0grog commented 4 years ago

If a the latest version and the version prior to the analysis timeframe were both errors (i.e. status code >= 400), we should probably deprioritize the change a little bit.

If both were also the exact same status code, we should deprioritize the change a lot. This matters because some 404 pages give you search results (as a helpful convenience), which can change over time, but which aren’t really meaningful page content we should be paying attention to. For example: https://monitoring.envirodatagov.org/page/737ec114-93e9-4399-8833-36cf32b1b056/c086767f-7b93-4fed-8fbc-a9240d1195dd..86d63312-059f-4606-9cfb-a1342e4e3ce1