edgi-govdata-archiving / web-monitoring-task-sheets

Experimental new tool for generating weekly analyst task sheets for web monitoring
GNU General Public License v3.0
3 stars 0 forks source link

Consider changes to redirects #5

Open Mr0grog opened 3 years ago

Mr0grog commented 3 years ago

We should probably consider changes to redirects in the priority calculation, or at least call them out one of the columns of the sheet.

For example, a couple weeks ago, https://www.epa.gov/aboutepa/about-national-health-and-environmental-effects-research-laboratory-nheerl went from being a 404 page to suddenly having content and a 200 status code again, but what really happened was that the 404 page started redirecting to an archived copy of the page at https://archive.epa.gov/epa/aboutepa/about-national-health-and-environmental-effects-research-laboratory-nheerl.html

See the above example in the API:

One reason this is important is that it can be hard to tell that pages at archive.epa.gov are in an archive. It looks obvious when you’re there in a browser with modern CSS:

Screen Shot 2020-09-21 at 6 41 17 PM

But that big banner is a background image on a pseudo element, and there’s no representative text about the archive anywhere in the page. (Honestly, this seems designed to be inaccessible, but I don’t want to speculate too much about what led to this very problematic technical setup.) In any case, without a lot better rendering support (for part of the solution to that, see https://github.com/edgi-govdata-archiving/web-monitoring-ui/issues/600), there’s just nothing on the page to indicate that it’s an archive. Besides, there are probably a number of other different situations where this is especially relevant and worth calling out.

See also: https://github.com/edgi-govdata-archiving/web-monitoring-ui/issues/509