NikolaiT / GoogleScraper

A Python module to scrape several search engines (like Google, Yandex, Bing, Duckduckgo, ...). Including asynchronous networking support.
https://scrapeulous.com/
Apache License 2.0
2.63k stars 735 forks source link

'chromedriver' executable needs to be in PATH (Windows 10) #234

Open JamesTheHacker opened 5 years ago

JamesTheHacker commented 5 years ago

I'm attempting to run GoogleScraper with the following command:

GoogleScraper -m selenium --keyword-file trap.txt --num-workers 5 --search-engines "bing,yahoo" --output-filename threaded-results.json -v debug

But it's returning the following error:

[MainThread] - 2019-04-07 10:49:19,846 - GoogleScraper.core - INFO - Going to scrape 12 keywords with 1 proxies by using 2 threads.
[MainThread] - 2019-04-07 10:49:19,848 - GoogleScraper.scraping - DEBUG - Sleeping ranges: [2, 2, 1, 2, 4, 17, 2, 2, 2, 1, 2,
1, 4, 1, 1, 5, 2, 1, 23, 20, 1, 2, 1, 1, 16, 13, 2, 1, 4, 1, 2, 1, 1, 4, 4, 2, 1, 16, 1, 1, 1, 1, 1, 4, 2, 1, 1, 2, 3, 1, 2, 28, 3, 5, 2, 4, 1, 1, 1, 22, 3, 4, 2, 1, 2, 2, 3, 25, 1, 1, 2, 1, 4, 2, 3, 2, 5, 1, 1, 5, 5, 1, 19, 2, 1, 2, 4, 2, 2, 1, 1, 1,
2, 1, 2, 2, 2, 1, 2, 2]
[MainThread] - 2019-04-07 10:49:19,849 - GoogleScraper.scraping - INFO - [+] SelScrape[localhost][search-type:normal][https://de.search.yahoo.com/search?] using search engine "yahoo". Num keywords=6, num pages for keyword=[1]
[MainThread] - 2019-04-07 10:49:19,849 - GoogleScraper.scraping - DEBUG - Sleeping ranges: [2, 3, 1, 2, 2, 1, 2, 5, 23, 18, 1, 1, 5, 1, 2, 3, 3, 1, 13, 2, 14, 1, 1, 1, 1, 1, 4, 2, 2, 2, 1, 5, 1, 4, 2, 25, 20, 1, 24, 4, 1, 1, 1, 2, 2, 3, 2, 5, 5, 2, 1,
2, 2, 1, 2, 1, 1, 1, 1, 2, 3, 2, 2, 1, 1, 3, 10, 1, 1, 1, 1, 2, 1, 2, 4, 2, 5, 5, 3, 1, 29, 2, 2, 2, 1, 2, 5, 1, 2, 2, 18, 2,
4, 2, 1, 1, 2, 2, 2, 1]
[MainThread] - 2019-04-07 10:49:19,851 - GoogleScraper.scraping - INFO - [+] SelScrape[localhost][search-type:normal][http://www.bing.com/search?] using search engine "bing". Num keywords=6, num pages for keyword=[1]
Exception in thread Thread-2:
Traceback (most recent call last):
  File "c:\python37\lib\site-packages\selenium\webdriver\common\service.py", line 76, in start
    stdin=PIPE)
  File "c:\python37\lib\subprocess.py", line 775, in __init__
    restore_signals, start_new_session)
  File "c:\python37\lib\subprocess.py", line 1178, in _execute_child
    startupinfo)
FileNotFoundError: [WinError 2] The system cannot find the file specified

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "c:\python37\lib\threading.py", line 917, in _bootstrap_inner
    self.run()
  File "c:\python37\lib\site-packages\GoogleScraper\selenium_mode.py", line 767, in run
    if not self._get_webdriver():
  File "c:\python37\lib\site-packages\GoogleScraper\selenium_mode.py", line 314, in _get_webdriver
    return self._get_Chrome()
  File "c:\python37\lib\site-packages\GoogleScraper\selenium_mode.py", line 353, in _get_Chrome
    chrome_options=chrome_options)
  File "c:\python37\lib\site-packages\selenium\webdriver\chrome\webdriver.py", line 73, in __init__
    self.service.start()
  File "c:\python37\lib\site-packages\selenium\webdriver\common\service.py", line 83, in start
    os.path.basename(self.path), self.start_error_message)
selenium.common.exceptions.WebDriverException: Message: 'chromedriver' executable needs to be in PATH. Please see https://sites.google.com/a/chromium.org/chromedriver/home

Exception in thread Thread-3:
Traceback (most recent call last):
  File "c:\python37\lib\site-packages\selenium\webdriver\common\service.py", line 76, in start
    stdin=PIPE)
  File "c:\python37\lib\subprocess.py", line 775, in __init__
    restore_signals, start_new_session)
  File "c:\python37\lib\subprocess.py", line 1178, in _execute_child
    startupinfo)
FileNotFoundError: [WinError 2] The system cannot find the file specified

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "c:\python37\lib\threading.py", line 917, in _bootstrap_inner
    self.run()
  File "c:\python37\lib\site-packages\GoogleScraper\selenium_mode.py", line 767, in run
    if not self._get_webdriver():
  File "c:\python37\lib\site-packages\GoogleScraper\selenium_mode.py", line 314, in _get_webdriver
    return self._get_Chrome()
  File "c:\python37\lib\site-packages\GoogleScraper\selenium_mode.py", line 353, in _get_Chrome
    chrome_options=chrome_options)
  File "c:\python37\lib\site-packages\selenium\webdriver\chrome\webdriver.py", line 73, in __init__
    self.service.start()
  File "c:\python37\lib\site-packages\selenium\webdriver\common\service.py", line 83, in start
    os.path.basename(self.path), self.start_error_message)
selenium.common.exceptions.WebDriverException: Message: 'chromedriver' executable needs to be in PATH. Please see https://sites.google.com/a/chromium.org/chromedriver/home

I have chromedriver added to my path. If I run where chromedriver it returns:

C:\chromedriver\chromedriver.exe

It definitely exists in my PATH.

I've also created a scrape_config.py file in the same directory as I'm running GoogleScraper with the following:

 chromedriver_path = 'C:\ChromeDriver\chromedriver.exe'

I'm not quite sure what's going on.

JamesTheHacker commented 5 years ago

I stuck a print in service.py to see where it was trying to load from and it seems to be loading from the developers path?!

try:
            cmd = [self.path]
            print(cmd)
            exit()
            cmd.extend(self.command_line_args())
            self.process = subprocess.Popen(cmd, env=self.env,
                                            close_fds=platform.system() != 'Windows',
                                            stdout=self.log_file,
                                            stderr=self.log_file,
                                            stdin=PIPE)

Returns

['/home/nikolai/projects/private/Drivers/chromedriver']
JamesTheHacker commented 5 years ago

I literally swapped out cmd for cmd = ['C:/ChromeDriver/chromedriver.exe'] and it works. But I don't like doing this. Is there any other way?

crimvirt commented 3 years ago

I'm having exact the same problem. I tried to move the chromedriver.exe to the GoogleScraper folder, and change the chromedriver_path in scrape_config.py, even tried raw mode by putting 'r' besides the path and still nothing happens. Is there any other way to fix this issue?