clemfromspace / scrapy-selenium

Scrapy middleware to handle javascript pages using selenium
Do What The F*ck You Want To Public License
911 stars 343 forks source link

Remote webdriver not working #94

Open nareto opened 3 years ago

nareto commented 3 years ago

I have a docker container running selenium-chrome (the "standalone-chrome" official container) and I'm trying to get scrapy-selenium to work with it. I have this in settings.py:

DOWNLOADER_MIDDLEWARES = {
      'scrapy_selenium.SeleniumMiddleware': 800
}
SELENIUM_DRIVER_NAME = 'chrome'
SELENIUM_COMMAND_EXECUTOR='http://localhost:4444/wd/hub' 
SELENIUM_DRIVER_ARGUMENTS=['--headless']  

and I tried different combinations, but I keep getting

[scrapy.middleware] WARNING: Disabled SeleniumMiddleware: SELENIUM_DRIVER_NAME and SELENIUM_DRIVER_EXECUTABLE_PATH must be set

I don't think SeleniumRequest is working, I get further down a KeyError for 'screenshot' on response.meta (I did set screenshot=True in SeleniumRequest).

The selenium container is indeed working, it works if I test it by passing it to webdriver.Remote. Any suggestions?

WaterKnight1998 commented 3 years ago

@nareto did you solve it?

nareto commented 3 years ago

No. I hadn't noticed the project is not maintained anymore. I am now trying to use scrapy-splash but having other problems. It's a pity because in my limited experience Selenium works better than Splash

Maybe I'll come back to this and not use the remote webdriver, but this makes it harder to dockerize the scraper.

rohitsathish commented 3 years ago

Having the same issue.

shijialee commented 3 years ago

Remote selenium driver doesn't work in version 0.7. For testing, replace middleware.py with the one from latest commit. Also, use SELENIUM_DRIVER_ARGUMENTS=['-headless'] for remote chrome driver.

hyobbb commented 3 years ago

I have the same issue so I am using selenium directly and there's no problem at all that is.. it doesn't make sense of using this package.