alan-turing-institute / misinformation-crawler

Web crawler to collect snapshots of articles to web archive
MIT License
5 stars 2 forks source link

Button pressing broken on time.com #319

Closed jemrobinson closed 5 years ago

jemrobinson commented 5 years ago

Button pressing is not working on time.com with the following errors

2019-07-15 16:15:10 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2019-07-15 16:15:10 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2019-07-15 16:15:12 [IndexPageSpider] INFO: Identified a javascript load button on https://time.com/5476903/donald-trump-shutdown-border-wall/.
2019-07-15 16:15:12 [IndexPageSpider] INFO: Clicked a form button (//form[@class="gdpr-form"]/input[@class="btn"]).
2019-07-15 16:15:12 [IndexPageSpider] INFO: Clicked a form button (//form[@class="gdpr-form"]/input[@class="btn"]).
2019-07-15 16:15:12 [IndexPageSpider] INFO: Clicked a form button (//form[@class="gdpr-form"]/input[@class="btn"]).
2019-07-15 16:15:12 [IndexPageSpider] INFO: Clicked a form button (//form[@class="gdpr-form"]/input[@class="btn"]).
2019-07-15 16:15:12 [IndexPageSpider] INFO: Clicked a form button (//form[@class="gdpr-form"]/input[@class="btn"]).
2019-07-15 16:15:12 [IndexPageSpider] INFO: Clicked a form button (//form[@class="gdpr-form"]/input[@class="btn"]).
2019-07-15 16:15:12 [IndexPageSpider] INFO: Clicked a form button (//form[@class="gdpr-form"]/input[@class="btn"]).
2019-07-15 16:15:12 [IndexPageSpider] INFO: Clicked a form button (//form[@class="gdpr-form"]/input[@class="btn"]).
2019-07-15 16:15:12 [IndexPageSpider] INFO: Clicked a form button (//form[@class="gdpr-form"]/input[@class="btn"]).
2019-07-15 16:15:17 [IndexPageSpider] INFO: Identified a javascript load button on https://time.com/5419926/president-midterm-campaign/.
2019-07-15 16:15:19 [IndexPageSpider] INFO: Identified a javascript load button on https://time.com/section/politics.
2019-07-15 16:15:23 [IndexPageSpider] INFO: Identified a javascript load button on https://time.com/5422270/kanye-west-trump-speech/.
2019-07-15 16:15:25 [IndexPageSpider] INFO: Identified a javascript load button on https://time.com/5384770/george-w-bush-john-mccain-funeral/.
2019-07-15 16:15:27 [IndexPageSpider] INFO: Identified a javascript load button on https://time.com/5408805/christine-blasey-ford-testimony-cspan-caller/.
2019-07-15 16:15:28 [scrapy.core.scraper] ERROR: Error downloading <GET https://time.com/5476903/donald-trump-shutdown-border-wall/>
Traceback (most recent call last):
  File "/Users/jrobinson/.pyenv/versions/misinformation/lib/python3.7/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
    result = g.send(result)
  File "/Users/jrobinson/.pyenv/versions/misinformation/lib/python3.7/site-packages/scrapy/core/downloader/middleware.py", line 43, in process_request
    defer.returnValue((yield download_func(request=request,spider=spider)))
  File "/Users/jrobinson/.pyenv/versions/misinformation/lib/python3.7/site-packages/twisted/internet/defer.py", line 1362, in returnValue
    raise _DefGen_Return(val)
twisted.internet.defer._DefGen_Return: <200 https://time.com/5476903/donald-trump-shutdown-border-wall/>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/jrobinson/.pyenv/versions/misinformation/lib/python3.7/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
    result = g.send(result)
  File "/Users/jrobinson/.pyenv/versions/misinformation/lib/python3.7/site-packages/scrapy/core/downloader/middleware.py", line 53, in process_response
    spider=spider)
  File "/Users/jrobinson/Projects/misinformation/misinformation-crawler/misinformation/middlewares/buttonpressmiddleware.py", line 201, in process_response
    self.press_form_buttons(spider)
  File "/Users/jrobinson/Projects/misinformation/misinformation-crawler/misinformation/middlewares/buttonpressmiddleware.py", line 172, in press_form_buttons
    button.press_button(self.driver)
  File "/Users/jrobinson/Projects/misinformation/misinformation-crawler/misinformation/middlewares/buttonpressmiddleware.py", line 33, in press_button
    self.element.send_keys(Keys.RETURN)
  File "/Users/jrobinson/.pyenv/versions/misinformation/lib/python3.7/site-packages/selenium/webdriver/remote/webelement.py", line 479, in send_keys
    'value': keys_to_typing(value)})
  File "/Users/jrobinson/.pyenv/versions/misinformation/lib/python3.7/site-packages/selenium/webdriver/remote/webelement.py", line 633, in _execute
    return self._parent.execute(command, params)
  File "/Users/jrobinson/.pyenv/versions/misinformation/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
    self.error_handler.check_response(response)
  File "/Users/jrobinson/.pyenv/versions/misinformation/lib/python3.7/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: element is not attached to the page document
  (Session info: headless chrome=75.0.3770.100)
  (Driver info: chromedriver=74.0.3729.6 (255758eccf3d244491b8a1317aa76e1ce10d57e9-refs/branch-heads/3729@{#29}),platform=Mac OS X 10.13.6 x86_64)