TechAndCheck / tech-and-check-alerts

Daily tip sheet for fact checkers
MIT License
13 stars 6 forks source link

Retry broken scrape URLs #337

Closed reefdog closed 4 years ago

reefdog commented 4 years ago

Currently, we bail out on a scrape if we've ever seen the URL before, regardless of if we scraped successfully. That's because we're using the unrestrictive getMostRecentScrapeTime() (which doesn't care if the scrape succeeded) rather than the restrictive getMostRecentSuccessfulScrapeTime().

There's a bit of an argument for this, in that without a configured safety guard like SCRAPE_DAY_HORIZON, we would repeatedly check broken links in perpetuity. But we do have that guard, and it would be good to re-check broken links in case they were only temporarily broken.

This idea was surfaced in https://github.com/TechAndCheck/tech-and-check-alerts/issues/277#issuecomment-591689187 and discussed/approved in Slack.