NewsDiffs / newsdiffs

Automatic scraper that tracks changes in news articles over time.
Other
1 stars 0 forks source link

Too many attempts to download http://www.bbc.co.uk/news/stories?print=true #2

Open carlgieringer opened 6 years ago

carlgieringer commented 6 years ago

The scraper errors while trying to download http://www.bbc.co.uk/news/stories?print=true. This doesn't seem to prevent the rest of the scraping from completing.

$ less /tmp/newsdiffs_logging_errs                                                                                                                                                   [ruby-2.4.2p198]
2017-11-01 06:48:12.591:ERROR:Unknown exception when updating http://www.bbc.co.uk/news/stories
2017-11-01 06:48:12.592:ERROR:Traceback (most recent call last):
  File "/Users/tech/code/newsdiffs/website/frontend/management/commands/scraper.py", line 414, in update_versions
    update_article(article)
  File "/Users/tech/code/newsdiffs/website/frontend/management/commands/scraper.py", line 321, in update_article
    parsed_article = load_article(article.url)
  File "/Users/tech/code/newsdiffs/website/frontend/management/commands/scraper.py", line 306, in load_article
    parsed_article = parser(url)
  File "/Users/tech/code/newsdiffs/website/frontend/management/commands/parsers/baseparser.py", line 117, in __init__
    self.html = grab_url(self._printableurl())
  File "/Users/tech/code/newsdiffs/website/frontend/management/commands/parsers/baseparser.py", line 45, in grab_url
    return grab_url(url, max_depth-1, opener)
  File "/Users/tech/code/newsdiffs/website/frontend/management/commands/parsers/baseparser.py", line 45, in grab_url
    return grab_url(url, max_depth-1, opener)
  File "/Users/tech/code/newsdiffs/website/frontend/management/commands/parsers/baseparser.py", line 45, in grab_url
    return grab_url(url, max_depth-1, opener)
  File "/Users/tech/code/newsdiffs/website/frontend/management/commands/parsers/baseparser.py", line 45, in grab_url
    return grab_url(url, max_depth-1, opener)
  File "/Users/tech/code/newsdiffs/website/frontend/management/commands/parsers/baseparser.py", line 45, in grab_url
    return grab_url(url, max_depth-1, opener)
  File "/Users/tech/code/newsdiffs/website/frontend/management/commands/parsers/baseparser.py", line 43, in grab_url
    raise Exception('Too many attempts to download %s' % url)
Exception: Too many attempts to download http://www.bbc.co.uk/news/stories?print=true
carlgieringer commented 6 years ago

One approach was to increase the timeout from 5 to 10 seconds.