Open danielballan opened 6 years ago
This should just be an extension of edgi-govdata-archiving/web-monitoring-processing#172. What’s the status with that, @weatherpattern? Are you planning to come back to it, or does it need someone else to take it over?
Unless I’m misunderstanding what you’re getting at here, @danielballan.
I was too vague here; let me try again.
Short of building a full scraper (or in addition to building a full scraper) we might run a service that regularly polls the response code for a set of important Pages. It wouldn't do anything about the content. The goal is to get better time resolution on the frequency and duration of outages than we can get from Versionista or IA or likely any service that is pulling down and storing (or comparing) content.
Aaaahhhhhhhhhhhhhhh, makes sense.
Maybe treat it kinda like we do the IA healthcheck, and run it with cron and have it pick up a manifest of URLs to check from disk, then notify to Sentry if any return 400+ status codes (or don’t respond at all).
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in seven days if no further activity occurs. If it should not be closed, please comment! Thank you for your contributions.
On the analyst call, it was noted that one page was dropping offline intermittently with unusual frequency, more than we usually see.
It might be useful to designate certain Pages for more regular spot checks (e.g. hourly). Maybe this is something to fit into the incipient work on a custom scraper.