edgi-govdata-archiving / web-monitoring

Documentation and project-wide issues for the Website Monitoring project (a.k.a. "Scanner")
Creative Commons Attribution Share Alike 4.0 International
106 stars 17 forks source link

Designating Pages for more regular spot checks #114

Open danielballan opened 6 years ago

danielballan commented 6 years ago

On the analyst call, it was noted that one page was dropping offline intermittently with unusual frequency, more than we usually see.

It might be useful to designate certain Pages for more regular spot checks (e.g. hourly). Maybe this is something to fit into the incipient work on a custom scraper.

Mr0grog commented 6 years ago

This should just be an extension of edgi-govdata-archiving/web-monitoring-processing#172. What’s the status with that, @weatherpattern? Are you planning to come back to it, or does it need someone else to take it over?

Mr0grog commented 6 years ago

Unless I’m misunderstanding what you’re getting at here, @danielballan.

danielballan commented 6 years ago

I was too vague here; let me try again.

Short of building a full scraper (or in addition to building a full scraper) we might run a service that regularly polls the response code for a set of important Pages. It wouldn't do anything about the content. The goal is to get better time resolution on the frequency and duration of outages than we can get from Versionista or IA or likely any service that is pulling down and storing (or comparing) content.

Mr0grog commented 6 years ago

Aaaahhhhhhhhhhhhhhh, makes sense.

Mr0grog commented 6 years ago

Maybe treat it kinda like we do the IA healthcheck, and run it with cron and have it pick up a manifest of URLs to check from disk, then notify to Sentry if any return 400+ status codes (or don’t respond at all).

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in seven days if no further activity occurs. If it should not be closed, please comment! Thank you for your contributions.