clemfromspace / scrapy-selenium

Scrapy middleware to handle javascript pages using selenium
Do What The F*ck You Want To Public License
922 stars 354 forks source link

Error while using geckodriver on ubuntu #70

Closed maciejbak85 closed 4 years ago

maciejbak85 commented 4 years ago
Unhandled error in Deferred:
2020-07-24 09:12:40 [twisted] CRITICAL: Unhandled error in Deferred:

Traceback (most recent call last):
  File "/home/baku/Dev/workspace/moje-python/scrape_linkedin/venv/lib/python3.8/site-packages/scrapy/crawler.py", line 192, in crawl
    return self._crawl(crawler, *args, **kwargs)
  File "/home/baku/Dev/workspace/moje-python/scrape_linkedin/venv/lib/python3.8/site-packages/scrapy/crawler.py", line 196, in _crawl
    d = crawler.crawl(*args, **kwargs)
  File "/home/baku/Dev/workspace/moje-python/scrape_linkedin/venv/lib/python3.8/site-packages/twisted/internet/defer.py", line 1613, in unwindGenerator
    return _cancellableInlineCallbacks(gen)
  File "/home/baku/Dev/workspace/moje-python/scrape_linkedin/venv/lib/python3.8/site-packages/twisted/internet/defer.py", line 1529, in _cancellableInlineCallbacks
    _inlineCallbacks(None, g, status)
--- <exception caught here> ---
  File "/home/baku/Dev/workspace/moje-python/scrape_linkedin/venv/lib/python3.8/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
    result = g.send(result)
  File "/home/baku/Dev/workspace/moje-python/scrape_linkedin/venv/lib/python3.8/site-packages/scrapy/crawler.py", line 87, in crawl
    self.engine = self._create_engine()
  File "/home/baku/Dev/workspace/moje-python/scrape_linkedin/venv/lib/python3.8/site-packages/scrapy/crawler.py", line 101, in _create_engine
    return ExecutionEngine(self, lambda _: self.stop())
  File "/home/baku/Dev/workspace/moje-python/scrape_linkedin/venv/lib/python3.8/site-packages/scrapy/core/engine.py", line 69, in __init__
    self.downloader = downloader_cls(crawler)
  File "/home/baku/Dev/workspace/moje-python/scrape_linkedin/venv/lib/python3.8/site-packages/scrapy/core/downloader/__init__.py", line 83, in __init__
    self.middleware = DownloaderMiddlewareManager.from_crawler(crawler)
  File "/home/baku/Dev/workspace/moje-python/scrape_linkedin/venv/lib/python3.8/site-packages/scrapy/middleware.py", line 53, in from_crawler
    return cls.from_settings(crawler.settings, crawler)
  File "/home/baku/Dev/workspace/moje-python/scrape_linkedin/venv/lib/python3.8/site-packages/scrapy/middleware.py", line 35, in from_settings
    mw = create_instance(mwcls, settings, crawler)
  File "/home/baku/Dev/workspace/moje-python/scrape_linkedin/venv/lib/python3.8/site-packages/scrapy/utils/misc.py", line 150, in create_instance
    instance = objcls.from_crawler(crawler, *args, **kwargs)
  File "/home/baku/Dev/workspace/moje-python/scrape_linkedin/venv/lib/python3.8/site-packages/scrapy_selenium/middlewares.py", line 67, in from_crawler
    middleware = cls(
  File "/home/baku/Dev/workspace/moje-python/scrape_linkedin/venv/lib/python3.8/site-packages/scrapy_selenium/middlewares.py", line 43, in __init__
    for argument in driver_arguments:
builtins.TypeError: 'NoneType' object is not iterable

2020-07-24 09:12:40 [twisted] CRITICAL: 
Traceback (most recent call last):
  File "/home/baku/Dev/workspace/moje-python/scrape_linkedin/venv/lib/python3.8/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
    result = g.send(result)
  File "/home/baku/Dev/workspace/moje-python/scrape_linkedin/venv/lib/python3.8/site-packages/scrapy/crawler.py", line 87, in crawl
    self.engine = self._create_engine()
  File "/home/baku/Dev/workspace/moje-python/scrape_linkedin/venv/lib/python3.8/site-packages/scrapy/crawler.py", line 101, in _create_engine
    return ExecutionEngine(self, lambda _: self.stop())
  File "/home/baku/Dev/workspace/moje-python/scrape_linkedin/venv/lib/python3.8/site-packages/scrapy/core/engine.py", line 69, in __init__
    self.downloader = downloader_cls(crawler)
  File "/home/baku/Dev/workspace/moje-python/scrape_linkedin/venv/lib/python3.8/site-packages/scrapy/core/downloader/__init__.py", line 83, in __init__
    self.middleware = DownloaderMiddlewareManager.from_crawler(crawler)
  File "/home/baku/Dev/workspace/moje-python/scrape_linkedin/venv/lib/python3.8/site-packages/scrapy/middleware.py", line 53, in from_crawler
    return cls.from_settings(crawler.settings, crawler)
  File "/home/baku/Dev/workspace/moje-python/scrape_linkedin/venv/lib/python3.8/site-packages/scrapy/middleware.py", line 35, in from_settings
    mw = create_instance(mwcls, settings, crawler)
  File "/home/baku/Dev/workspace/moje-python/scrape_linkedin/venv/lib/python3.8/site-packages/scrapy/utils/misc.py", line 150, in create_instance
    instance = objcls.from_crawler(crawler, *args, **kwargs)
  File "/home/baku/Dev/workspace/moje-python/scrape_linkedin/venv/lib/python3.8/site-packages/scrapy_selenium/middlewares.py", line 67, in from_crawler
    middleware = cls(
  File "/home/baku/Dev/workspace/moje-python/scrape_linkedin/venv/lib/python3.8/site-packages/scrapy_selenium/middlewares.py", line 43, in __init__
    for argument in driver_arguments:
TypeError: 'NoneType' object is not iterable

my settings.py

from shutil import which

SELENIUM_DRIVER_NAME = 'firefox'
SELENIUM_DRIVER_EXECUTABLE_PATH = which('geckodriver')
SELENIUM_BROWSER_EXECUTABLE_PATH = which('firefox')
...
'scrapy_selenium.SeleniumMiddleware': 800,

looks like permissions for driver are good:

:/usr/local/bin$ ll | grep gecko
-rwxrwxrwx  1 baku baku 7008696 lip 24 09:09 geckodriver*

party of crawler which fails:

class LinkedInProfileSeleniumSpider(scrapy.Spider):
    name = 'lips'
    allowed_domains = ['www.linkedin.com']

    def start_requests(self):
        yield SeleniumRequest(
            url="https://www.linkedin.com/login/",
            callback=self.proceed_login,
            wait_until=(
                    EC.presence_of_element_located(
                        (By.CSS_SELECTOR, "#username")
                    )
            ),
            script='window.scrollTo(0, document.body.scrollHeight);',
            wait_time=30
        )

    def proceed_login(self, response):
        # AFTER LOGIN
        driver = response.request.meta['driver']
        ...

can you polease help why why its failing ? thanks!

maciejbak85 commented 4 years ago

dunno what had happened but after shutdown and run again on next day it started to work ;)