MatthewChatham / glassdoor-review-scraper

Scrape reviews from Glassdoor
BSD 2-Clause "Simplified" License
172 stars 251 forks source link

No Such Element Exception #39

Open yizhu-millie opened 4 years ago

yizhu-millie commented 4 years ago

Hi, I've got the following error with the latest code and was wondering how to fix such problem.

2020-04-09 23:37:11,677 INFO 416 :main.py(27691) - Configuring browser 2020-04-09 23:37:18,237 INFO 458 :main.py(27691) - Scraping up to 25 reviews. 2020-04-09 23:37:18,269 INFO 395 :main.py(27691) - Signing in to #########.com 2020-04-09 23:37:34,673 INFO 375 :main.py(27691) - Navigating to company reviews 2020-04-09 23:37:44,163 INFO 322 :main.py(27691) - Extracting reviews from page 1 2020-04-09 23:37:44,822 INFO 327 :main.py(27691) - Found 10 reviews on page 1 Traceback (most recent call last): File "/Users/millie/PycharmProjects/glassdoor-review-scraper/main.py", line 500, in main() File "/Users/millie/PycharmProjects/glassdoor-review-scraper/main.py", line 480, in main reviews_df = extract_from_page() File "/Users/millie/PycharmProjects/glassdoor-review-scraper/main.py", line 331, in extract_from_page data = extract_review(review) File "/Users/millie/PycharmProjects/glassdoor-review-scraper/main.py", line 317, in extract_review res[field] = scrape(field, review, author) File "/Users/millie/PycharmProjects/glassdoor-review-scraper/main.py", line 300, in scrape return fdictfield File "/Users/millie/PycharmProjects/glassdoor-review-scraper/main.py", line 155, in scrape_years res = review.find_element_by_class_name('commonEiReviewTextStylesallowLineBreaks').find_element_by_xpath('preceding-sibling::p').text File "/Users/millie/Library/Python/3.7/lib/python/site-packages/selenium/webdriver/remote/webelement.py", line 398, in find_element_by_class_name return self.find_element(by=By.CLASS_NAME, value=name) File "/Users/millie/Library/Python/3.7/lib/python/site-packages/selenium/webdriver/remote/webelement.py", line 659, in find_element {"using": by, "value": value})['value'] File "/Users/millie/Library/Python/3.7/lib/python/site-packages/selenium/webdriver/remote/webelement.py", line 633, in _execute return self._parent.execute(command, params) File "/Users/millie/Library/Python/3.7/lib/python/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute self.error_handler.check_response(response) File "/Users/millie/Library/Python/3.7/lib/python/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response raise exception_class(message, screen, stacktrace) selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":".commonEiReviewTextStylesallowLineBreaks"} (Session info: chrome=81.0.4044.92)

I've seen others posting similar issues in the past, were they due to the updates of Glassdoor? Any help will be much appreciated. Thanks.

sachinchaturvedi93 commented 4 years ago

Try this https://github.com/sachinchaturvedi93/glassdoor-review-scraper

yizhu-millie commented 4 years ago

Try this https://github.com/sachinchaturvedi93/glassdoor-review-scraper

Thank you very much, the code is working.

One more question, when I was scraping a company with around 2000 reviews, I set the limit to be 3000 as I intended to include all the reviews, the code went through more than 10,000 pages without stopping even though there's no review on those pages.

If I did not set a limit, the code would stop pretty soon and left with only a dozen of reviews.

I was wondering what is the most efficient way of scraping all the reviews of a company without going through those empty pages? Many thanks.

csguzmanf commented 4 years ago

Hello, I received the same error and when clone the second repository it still gave me:

Franciscos-MacBook-Air:glassdoor-review-scraper kiko$ python3 main.py --headless --url "https://www.glassdoor.com/Overview/Working-at-Wells-Fargo-EI_IE8876.11,22.htm" --limit 1000 -f wells_fargo_reviews.csv 2020-04-25 18:40:18,833 INFO 366 :main.py(61155) - Configuring browser 2020-04-25 18:40:21,905 INFO 408 :main.py(61155) - Scraping up to 1000 reviews. 2020-04-25 18:40:21,979 INFO 347 :main.py(61155) - Signing in to f.guzmanii3469@student.leedsbeckett.ac.uk Traceback (most recent call last): File "main.py", line 450, in main() File "main.py", line 412, in main sign_in() File "main.py", line 360, in sign_in submit_btn.click() File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/selenium/webdriver/remote/webelement.py", line 80, in click self._execute(Command.CLICK_ELEMENT) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/selenium/webdriver/remote/webelement.py", line 633, in _execute return self._parent.execute(command, params) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute self.error_handler.check_response(response) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response raise exception_class(message, screen, stacktrace) selenium.common.exceptions.ElementClickInterceptedException: Message: element click intercepted: Element is not clickable at point (400, 488). Other element would receive the click:

...

(Session info: headless chrome=81.0.4044.122)

Is there anything specific I should be looking for to try to fix the error?

Thanks for any help you can give!

jenil4all commented 4 years ago

Try this https://github.com/sachinchaturvedi93/glassdoor-review-scraper

Still the advice_to_mgmt & rating_overall fields not fetching data

Shivanandrai commented 3 years ago

Try this https://github.com/sachinchaturvedi93/glassdoor-review-scraper

Still the advice_to_mgmt & rating_overall fields not fetching data

Hi, to get rating_overall, this is the code that works.

def scrape_overall_rating(review): try: ratings = review.find_element_by_class_name('gdStars') ratings = ratings.find_element_by_class_name('commonStarStylesgdStars') overall = ratings.find_element_by_class_name('rating') res = overall.get_attribute('title') except Exception: res = np.nan return res

spsingh559 commented 3 years ago

Try this https://github.com/sachinchaturvedi93/glassdoor-review-scraper

Still the advice_to_mgmt & rating_overall fields not fetching data

I am also facing issue in advice_to_mgmt. Any help would be appreciated