Open yizhu-millie opened 4 years ago
Try this https://github.com/sachinchaturvedi93/glassdoor-review-scraper
Thank you very much, the code is working.
One more question, when I was scraping a company with around 2000 reviews, I set the limit to be 3000 as I intended to include all the reviews, the code went through more than 10,000 pages without stopping even though there's no review on those pages.
If I did not set a limit, the code would stop pretty soon and left with only a dozen of reviews.
I was wondering what is the most efficient way of scraping all the reviews of a company without going through those empty pages? Many thanks.
Hello, I received the same error and when clone the second repository it still gave me:
Franciscos-MacBook-Air:glassdoor-review-scraper kiko$ python3 main.py --headless --url "https://www.glassdoor.com/Overview/Working-at-Wells-Fargo-EI_IE8876.11,22.htm" --limit 1000 -f wells_fargo_reviews.csv
2020-04-25 18:40:18,833 INFO 366 :main.py(61155) - Configuring browser
2020-04-25 18:40:21,905 INFO 408 :main.py(61155) - Scraping up to 1000 reviews.
2020-04-25 18:40:21,979 INFO 347 :main.py(61155) - Signing in to f.guzmanii3469@student.leedsbeckett.ac.uk
Traceback (most recent call last):
File "main.py", line 450, in ...
Is there anything specific I should be looking for to try to fix the error?
Thanks for any help you can give!
Try this https://github.com/sachinchaturvedi93/glassdoor-review-scraper
Still the advice_to_mgmt & rating_overall fields not fetching data
Try this https://github.com/sachinchaturvedi93/glassdoor-review-scraper
Still the advice_to_mgmt & rating_overall fields not fetching data
Hi, to get rating_overall, this is the code that works.
def scrape_overall_rating(review): try: ratings = review.find_element_by_class_name('gdStars') ratings = ratings.find_element_by_class_name('commonStarStylesgdStars') overall = ratings.find_element_by_class_name('rating') res = overall.get_attribute('title') except Exception: res = np.nan return res
Try this https://github.com/sachinchaturvedi93/glassdoor-review-scraper
Still the advice_to_mgmt & rating_overall fields not fetching data
I am also facing issue in advice_to_mgmt. Any help would be appreciated
Hi, I've got the following error with the latest code and was wondering how to fix such problem.
2020-04-09 23:37:11,677 INFO 416 :main.py(27691) - Configuring browser 2020-04-09 23:37:18,237 INFO 458 :main.py(27691) - Scraping up to 25 reviews. 2020-04-09 23:37:18,269 INFO 395 :main.py(27691) - Signing in to #########.com 2020-04-09 23:37:34,673 INFO 375 :main.py(27691) - Navigating to company reviews 2020-04-09 23:37:44,163 INFO 322 :main.py(27691) - Extracting reviews from page 1 2020-04-09 23:37:44,822 INFO 327 :main.py(27691) - Found 10 reviews on page 1 Traceback (most recent call last): File "/Users/millie/PycharmProjects/glassdoor-review-scraper/main.py", line 500, in
main()
File "/Users/millie/PycharmProjects/glassdoor-review-scraper/main.py", line 480, in main
reviews_df = extract_from_page()
File "/Users/millie/PycharmProjects/glassdoor-review-scraper/main.py", line 331, in extract_from_page
data = extract_review(review)
File "/Users/millie/PycharmProjects/glassdoor-review-scraper/main.py", line 317, in extract_review
res[field] = scrape(field, review, author)
File "/Users/millie/PycharmProjects/glassdoor-review-scraper/main.py", line 300, in scrape
return fdictfield
File "/Users/millie/PycharmProjects/glassdoor-review-scraper/main.py", line 155, in scrape_years
res = review.find_element_by_class_name('commonEiReviewTextStylesallowLineBreaks').find_element_by_xpath('preceding-sibling::p').text
File "/Users/millie/Library/Python/3.7/lib/python/site-packages/selenium/webdriver/remote/webelement.py", line 398, in find_element_by_class_name
return self.find_element(by=By.CLASS_NAME, value=name)
File "/Users/millie/Library/Python/3.7/lib/python/site-packages/selenium/webdriver/remote/webelement.py", line 659, in find_element
{"using": by, "value": value})['value']
File "/Users/millie/Library/Python/3.7/lib/python/site-packages/selenium/webdriver/remote/webelement.py", line 633, in _execute
return self._parent.execute(command, params)
File "/Users/millie/Library/Python/3.7/lib/python/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "/Users/millie/Library/Python/3.7/lib/python/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":".commonEiReviewTextStylesallowLineBreaks"}
(Session info: chrome=81.0.4044.92)
I've seen others posting similar issues in the past, were they due to the updates of Glassdoor? Any help will be much appreciated. Thanks.