edgi-govdata-archiving / web-monitoring-db

An HTTP API for tracking and annotating changes to a set of web pages.
https://api.monitoring.envirodatagov.org/
GNU General Public License v3.0
17 stars 26 forks source link

Erroneously over-prioritized Cleannet page #545

Closed Mr0grog closed 1 year ago

Mr0grog commented 5 years ago

This page got a relatively high priority (0.388), but clearly should not have: https://monitoring.envirodatagov.org/page/acbec676-e2e8-480a-a80a-79f3fc6f146f/4d2d7b8e-8770-4a17-a54b-b7399ed932a0..84729f0b-8627-4a8e-b23c-f09c81f064b2

I think what happened is that a lot of links changed (the link text includes a count of matches in the link destination, and the counts have changed since new documents were published). Some ideas for being more nuanced here:

@danielballan any thoughts on these?

Mr0grog commented 4 years ago

Need to check this against the new analysis in https://github.com/edgi-govdata-archiving/web-monitoring-task-sheets/