NikolaiT / GoogleScraper

A Python module to scrape several search engines (like Google, Yandex, Bing, Duckduckgo, ...). Including asynchronous networking support.
https://scrapeulous.com/
Apache License 2.0
2.6k stars 734 forks source link

Testing error on chrome browser type. I use Anaconda for python environment. #232

Closed MindaugasVaitkus2 closed 5 years ago

MindaugasVaitkus2 commented 5 years ago

Testing error on chrome browser type. I use Anaconda for python environment. See cmd description below.

(base) C:\Users\Vartotojas\Documents\GitHub\GoogleScraper>python -m pytest Tests/functional_tests.py::GoogleScraperMinimalFunctionalTestCase ============================= test session starts ============================= platform win32 -- Python 3.7.0, pytest-3.8.2, py-1.7.0, pluggy-0.7.1 rootdir: C:\Users\Vartotojas\Documents\GitHub\GoogleScraper, inifile: collected 2 items

Tests\functional_tests.py FF [100%]

================================== FAILURES =================================== GoogleScraperMinimalFunctionalTestCase.test_bing_with_chrome_and_jsonoutput

self =

def test_bing_with_chrome_and_json_output(self):
            """
                Very common use case:

                Ensures that we can scrape three continuous sites with Bing using
                chrome in headless mode and save the results to a JSON file.
                """
            results_file = os.path.join(tempfile.gettempdir(), 'results-chrome.json')
            if os.path.exists(results_file):
                os.remove(results_file)

            query = 'Startup San Francisco'

            config = {
                'keyword': query,
                'search_engines': ['Bing'],
                'num_results_per_page': 20, # this is ignored by bing, 10 results per page
                'num_pages_for_keyword': 3,
                'scrape_method': 'selenium',
                'sel_browser': 'chrome',
                'do_sleep': False,
                'browser_mode': 'normal',
                'chromedriver_path': 'C://Chromedriver//chromedriver.exe',
                'output_filename': results_file,
                'do_caching': False,
            }

            search = scrape_with_config(config)

            self.assertLess(search.started_searching, search.stopped_searching)
            self.assertEqual(search.number_proxies_used, 1)
            self.assertEqual(search.number_search_engines_used, 1)
            self.assertEqual(search.number_search_queries, 1)
          self.assertEqual(len(search.serps), 3)

E AssertionError: 0 != 3

Tests\functional_tests.py:171: AssertionError ---------------------------- Captured stderr call ----------------------------- Exception in thread Thread-2: Traceback (most recent call last): File "C:\Users\Vartotojas\Anaconda3\lib\site-packages\selenium\webdriver\common\service.py", line 76, in start stdin=PIPE) File "C:\Users\Vartotojas\Anaconda3\lib\subprocess.py", line 756, in init restore_signals, start_new_session) File "C:\Users\Vartotojas\Anaconda3\lib\subprocess.py", line 1155, in _execute_child startupinfo) FileNotFoundError: [WinError 2] The system cannot find the file specified

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\Users\Vartotojas\Anaconda3\lib\threading.py", line 917, in _bootstrap_inner self.run() File "C:\Users\Vartotojas\Documents\GitHub\GoogleScraper\GoogleScraper\selenium_mode.py", line 769, in run if not self._get_webdriver(): File "C:\Users\Vartotojas\Documents\GitHub\GoogleScraper\GoogleScraper\selenium_mode.py", line 316, in _get_webdriver return self._get_Chrome() File "C:\Users\Vartotojas\Documents\GitHub\GoogleScraper\GoogleScraper\selenium_mode.py", line 355, in _get_Chrome chrome_options=chrome_options) File "C:\Users\Vartotojas\Anaconda3\lib\site-packages\selenium\webdriver\chrome\webdriver.py", line 73, in init self.service.start() File "C:\Users\Vartotojas\Anaconda3\lib\site-packages\selenium\webdriver\common\service.py", line 83, in start os.path.basename(self.path), self.start_error_message) selenium.common.exceptions.WebDriverException: Message: '' executable needs to be in PATH. Please see https://sites.google.com/a/chromium.org/chromedriver/home

------------------------------ Captured log call ------------------------------ core.py 396 INFO Going to scrape 3 keywords with 1 proxies by using 1 threads. scraping.py 330 INFO [+] SelScrape[localhost][search-type:normal][http://www.bing.com/search?] using search engine "bing". Num keywords=1, num pages for keyword=[1] GoogleScraperMinimalFunctionalTestCase.test_google_with_chrome_and_json_output

self =

def test_google_with_chrome_and_json_output(self):
        """
            Very common use case:

            Ensures that we can scrape three continuous sites with Google using
            chrome in normal mode and save the results to a JSON file.
            """
        results_file = os.path.join(tempfile.gettempdir(), 'results-chrome.json')
        if os.path.exists(results_file):
            os.remove(results_file)

        query = 'Food New York'

        config = {
            'keyword': query,
            'search_engines': ['Google'],
            'num_results_per_page': 100,
            'num_pages_for_keyword': 3,
            'scrape_method': 'selenium',
            'sel_browser': 'chrome',
            'do_sleep': False,
            'browser_mode': 'normal',
            'chromedriver_path': 'C://Chromedriver//chromedriver.exe',
            'output_filename': results_file,
            'do_caching': False,
        }

        search = scrape_with_config(config)

        self.assertLess(search.started_searching, search.stopped_searching)
        self.assertEqual(search.number_proxies_used, 1)
        self.assertEqual(search.number_search_engines_used, 1)
        self.assertEqual(search.number_search_queries, 1)
      self.assertEqual(len(search.serps), 3)

E AssertionError: 0 != 3

Tests\functional_tests.py:85: AssertionError ---------------------------- Captured stderr call ----------------------------- Exception in thread Thread-4: Traceback (most recent call last): File "C:\Users\Vartotojas\Anaconda3\lib\site-packages\selenium\webdriver\common\service.py", line 76, in start stdin=PIPE) File "C:\Users\Vartotojas\Anaconda3\lib\subprocess.py", line 756, in init restore_signals, start_new_session) File "C:\Users\Vartotojas\Anaconda3\lib\subprocess.py", line 1155, in _execute_child startupinfo) FileNotFoundError: [WinError 2] The system cannot find the file specified

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\Users\Vartotojas\Anaconda3\lib\threading.py", line 917, in _bootstrap_inner self.run() File "C:\Users\Vartotojas\Documents\GitHub\GoogleScraper\GoogleScraper\selenium_mode.py", line 769, in run if not self._get_webdriver(): File "C:\Users\Vartotojas\Documents\GitHub\GoogleScraper\GoogleScraper\selenium_mode.py", line 316, in _get_webdriver return self._get_Chrome() File "C:\Users\Vartotojas\Documents\GitHub\GoogleScraper\GoogleScraper\selenium_mode.py", line 355, in _get_Chrome chrome_options=chrome_options) File "C:\Users\Vartotojas\Anaconda3\lib\site-packages\selenium\webdriver\chrome\webdriver.py", line 73, in init self.service.start() File "C:\Users\Vartotojas\Anaconda3\lib\site-packages\selenium\webdriver\common\service.py", line 83, in start os.path.basename(self.path), self.start_error_message) selenium.common.exceptions.WebDriverException: Message: '' executable needs to be in PATH. Please see https://sites.google.com/a/chromium.org/chromedriver/home

------------------------------ Captured log call ------------------------------ core.py 396 INFO Going to scrape 3 keywords with 1 proxies by using 1 threads. scraping.py 330 INFO [+] GoogleSelScrape[localhost][search-type:normal][https://www.google.com/search?] using search engine "google". Num keywords=1, num pages for keyword=[1] ============================== warnings summary =============================== C:\Users\Vartotojas\Anaconda3\lib\site-packages\sqlalchemy\sql\base.py:49: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working class _DialectArgView(collections.MutableMapping):

C:\Users\Vartotojas\Anaconda3\lib\site-packages\sqlalchemy\engine\result.py:182: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working from collections import Sequence

C:\Users\Vartotojas\Anaconda3\lib\site-packages\lxml\html_setmixin.py:1: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working from collections import MutableSet

C:\Users\Vartotojas\Documents\GitHub\GoogleScraper\GoogleScraper\socks.py:61: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working from collections import Callable

C:\Users\Vartotojas\Anaconda3\lib\site-packages\aiohttp\multipart.py:8: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working from collections import Mapping, Sequence, deque

C:\Users\Vartotojas\Documents\GitHub\GoogleScraper\GoogleScraper\selenium_mode.py:355: DeprecationWarning: use options instead of chrome_options chrome_options=chrome_options)

C:\Users\Vartotojas\Documents\GitHub\GoogleScraper\GoogleScraper\selenium_mode.py:355: DeprecationWarning: use options instead of chrome_options chrome_options=chrome_options)

-- Docs: https://docs.pytest.org/en/latest/warnings.html ==================== 2 failed, 7 warnings in 6.65 seconds =====================