clemfromspace / scrapy-selenium

Scrapy middleware to handle javascript pages using selenium
Do What The F*ck You Want To Public License
923 stars 354 forks source link

KeyError: 'driver' or 'screenshot' #74

Open afperezp opened 4 years ago

afperezp commented 4 years ago

hey i just started to scrape with scrapy-selenium but Bildschirmfoto 2020-09-14 um 11 11 24 i am always getting this same problem. My mentor suggested adding Webdriver to the path, but the problem is not fixed, any suggestions?

tristanlatr commented 4 years ago

What's your code ?

uselessvevo commented 3 years ago

Same thing. Can't access response.meta['screenshot'] or 'driver' in my middleware

raghavsehgal1 commented 3 years ago

I'm facing the same issue. As requested by @tristanlatr above, here's my code -

`import scrapy
 from scrapy_selenium import SeleniumRequest
 from selenium import webdriver
 from shutil import which
 import requests

  class LinkedinCrawlerSpider(scrapy.Spider):
          name = 'linkedin_crawler'
          allowed_domains = ['www.linkedin.com']

    def start_requests(self):
        yield SeleniumRequest(
            url = 'https://www.linkedin.com/sales/login',
            wait_time = 5,
            callback = self.login
        )
    def login(self, response):

        print(response.request.meta['driver'].title)`

Screenshot of the error -

Screenshot 2020-12-22 at 1 12 27 PM
tristanlatr commented 3 years ago

@raghavsehgal1 Did you activate the downloader middleware in the settings.py ?

raghavsehgal1 commented 3 years ago

@tristanlatr Yes, I did. scrapy-selenium related Code snippet from settings.py -

   DOWNLOADER_MIDDLEWARES = {
                  'scrapy_selenium.SeleniumMiddleware': 800
              }

  from shutil import which

  SELENIUM_DRIVER_NAME = 'chrome'
  SELENIUM_DRIVER_EXECUTABLE_PATH = which('chromedriver')
  SELENIUM_DRIVER_ARGUMENTS=['--headless']  # '--headless' if using chrome instead of firefox

I have been trying multiple approaches but I have been unable to figure out why the issue persists. I tried to print the meta object without any keys. This is the object that was printed -

{'download_timeout': 180.0, 'download_slot': 'golden.com', 'download_latency': 0.34168577194213867}

blackwhiteman commented 3 years ago

I had a similar issue and was resolved by making sure driver arguments is set to ['--headless'] i.e. two dashes for chrome

Lpaydat commented 3 years ago

I had the same issue, it turned out that I don't have the driver installed. (it's says in the log that the driver name and driver path is not set, so the selenium middleware is disable)

The issue solved after install the chromedriver or geckodriver.

Hope this help.

tomato-ga commented 2 years ago

I had the same issue. I get this error when debugging, but the code seems to be working fine.

mahmoudodoo commented 2 years ago

['--headless'] Hi @blackwhiteman It's still the same issue i changed as the following: SELENIUM_DRIVER_NAME = "chrome" SELENIUM_DRIVER_EXECUTABLE_PATH = '/drivers/chromedriver' SELENIUM_DRIVER_ARGUMENTS = ['--headless']