NikolaiT / GoogleScraper

A Python module to scrape several search engines (like Google, Yandex, Bing, Duckduckgo, ...). Including asynchronous networking support.
https://scrapeulous.com/
Apache License 2.0
2.63k stars 735 forks source link

Multiple Page Selenium Issue #103

Open nicholassewitz opened 9 years ago

nicholassewitz commented 9 years ago

Hi, your scraper is amazing, but I am running into a problem. When I use this command GoogleScraper -m selenium --keyword-file keywords.txt -v2 --num-pages-for-keyword 30 I get the following error when doing many pages. If the page count is under 20 or so it will work fine. Can you help?

Exception in thread [google]SelScrape: Traceback (most recent call last): File "/usr/local/Cellar/python3/3.4.3/Frameworks/Python.framework/Versions/3.4/lib/python3.4/threading.py", line 920, in _bootstrap_inner self.run() File "/usr/local/lib/python3.4/site-packages/GoogleScraper/selenium_mode.py", line 589, in run self.search() File "/usr/local/lib/python3.4/site-packages/GoogleScraper/selenium_mode.py", line 545, in search next_url = self._goto_next_page() File "/usr/local/lib/python3.4/site-packages/GoogleScraper/selenium_mode.py", line 398, in _goto_next_page next_url = element.get_attribute('href') File "/usr/local/lib/python3.4/site-packages/selenium/webdriver/remote/webelement.py", line 97, in get_attribute resp = self._execute(Command.GET_ELEMENT_ATTRIBUTE, {'name': name}) File "/usr/local/lib/python3.4/site-packages/selenium/webdriver/remote/webelement.py", line 402, in _execute return self._parent.execute(command, params) File "/usr/local/lib/python3.4/site-packages/selenium/webdriver/remote/webdriver.py", line 175, in execute self.error_handler.check_response(response) File "/usr/local/lib/python3.4/site-packages/selenium/webdriver/remote/errorhandler.py", line 166, in check_response raise exception_class(message, screen, stacktrace) selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: element is not attached to the page document (Session info: chrome=42.0.2311.135) (Driver info: chromedriver=2.15.322455 (ae8db840dac8d0c453355d3d922c91adfb61df8f),platform=Mac OS X 10.10.3 x86_64)

Best, Nicholas

jamesspittal commented 9 years ago

I've noticed similar problems when using --num-pages-for-keyword myself.

NikolaiT commented 9 years ago

Going to fix this soon.

kunli-cs commented 8 years ago

Have tihs problem solved?