disinfoRG / ZeroScraper

Web scraper made by 0archive.
https://0archive.tw
MIT License
10 stars 2 forks source link

selenium crash on M2 #115

Open andreawwenyi opened 4 years ago

andreawwenyi commented 4 years ago

reading from job log, it looks like the selenium chrome running on M2 would crash with the following exception. Need to look into this further for more stable crawl. However, since the sites that would use selenium are all being discovered & updated decently, I think this is not an urgent matter.

2020-05-02 04:44:25 scrapy.core.scraper ERROR: Error downloading <GET https://kknews.cc/food/jav4enl.html>
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
    result = g.send(result)
  File "/usr/local/lib/python3.7/dist-packages/scrapy/core/downloader/middleware.py", line 38, in process_request
    response = yield method(request=request, spider=spider)
  File "/srv/web/newsSpiders/middlewares.py", line 85, in process_request
    self.driver.get(request.url)
  File "/usr/local/lib/python3.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 333, in get
    self.execute(Command.GET, {'url': url})
  File "/usr/local/lib/python3.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
    self.error_handler.check_response(response)
  File "/usr/local/lib/python3.7/dist-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unknown error: session deleted because of page crash
from unknown error: cannot determine loading status
from tab crashed
  (Session info: headless chrome=79.0.3945.130)