MatthewChatham / glassdoor-review-scraper

Scrape reviews from Glassdoor
BSD 2-Clause "Simplified" License
179 stars 252 forks source link

--start_from_url NoSuchElementException #40

Open XiaoxuanMa opened 4 years ago

XiaoxuanMa commented 4 years ago

Hi, I got the following error when trying to use the --start_from_url function. I need help with this, thanks.

python main.py --headless --start_from_url --limit 999 --url "https://www.glassdoor.com/Reviews/Amazon-Reviews-E6036_P100.htm" -f Amazon_2008.csv 2020-04-28 22:03:27,057 INFO 367 :main.py(10156) - Configuring browser

DevTools listening on ws://127.0.0.1:51346/devtools/browser/bd3f16cf-aff6-41e9-b7f4-a234f825187e 2020-04-28 22:03:30,202 INFO 409 :main.py(10156) - Scraping up to 999 reviews. 2020-04-28 22:03:30,213 INFO 348 :main.py(10156) - Signing in to maxiaoxuan2018@gmail.com 2020-04-28 22:03:45,688 INFO 377 :main.py(10156) - Getting current page number Traceback (most recent call last): File "main.py", line 451, in main() File "main.py", line 427, in main page[0] = get_current_page() File "main.py", line 383, in get_current_page normalize-space(@class),\' \'),\' disabled \')]') File "C:\Users\MXX\Anaconda3\lib\site-packages\selenium\webdriver\remote\webelement.py", line 351, in find_element_by_xpath return self.find_element(by=By.XPATH, value=xpath) File "C:\Users\MXX\Anaconda3\lib\site-packages\selenium\webdriver\remote\webelement.py", line 659, in find_element {"using": by, "value": value})['value'] File "C:\Users\MXX\Anaconda3\lib\site-packages\selenium\webdriver\remote\webelement.py", line 633, in _execute return self._parent.execute(command, params) File "C:\Users\MXX\Anaconda3\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute self.error_handler.check_response(response) File "C:\Users\MXX\Anaconda3\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response raise exception_class(message, screen, stacktrace) selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//ul//li[contains (concat(' ',normalize-space(@class),' '),' current ')] //span[contains(concat(' ', normalize-space(@class),' '),' disabled ')]"} (Session info: headless chrome=81.0.4044.122)

hannez11 commented 4 years ago

def get_current_page(): logger.info('Getting current page number') current = int(browser.find_element_by_class_name('paginationPaginationStylecurrent').text) return current

Shivanandrai commented 4 years ago

def get_current_page(): logger.info('Getting current page number') current = int(browser.find_element_by_class_name('paginationPaginationStylecurrent').text) return current

This works perfectly, thank you so much for this!