edgi-govdata-archiving / web-monitoring-db

An HTTP API for tracking and annotating changes to a set of web pages.
https://api.monitoring.envirodatagov.org/
GNU General Public License v3.0
17 stars 26 forks source link

Use effective_status for Page#status #1103

Closed Mr0grog closed 1 year ago

Mr0grog commented 1 year ago

Pages have a status attribute that is meant to be a convenient way to figure out what the current status of a page is, skipping over short-lived, erroneous errors from a single snapshot. A while back, we added "effective" status codes to versions, which attempt to figure out if a version with a 200 status code actually should have been an error code (surprisingly common!). We should have updated Page's status calculation to use that, since the page is really about what is effectively the current state, but missed it at the time.

Even though this project is effectively dormant and being shut down, I’m adding this in to help make sure we get more useful archival data out as part of edgi-govdata-archiving/web-monitoring#170.