MatthewChatham / glassdoor-review-scraper

Scrape reviews from Glassdoor
BSD 2-Clause "Simplified" License
172 stars 251 forks source link

Fixed Pros, Cons and Advice to Management, Helpful count, Pagination error, Years at the company bug #27

Closed sachinchaturvedi93 closed 4 years ago

sachinchaturvedi93 commented 4 years ago

Also added the control statement when the reviews are over and the reviews to be scraped are more than asked for. e.g. , you asked for 1000 reviews to be scraped but there are only 156. So this change will stop the infinite loop which is inside the extract_from_page function.

if(len(reviews)== 0):
    logger.info('No more Review!')
    date_limit_reached[0] = True
GaglianoM commented 4 years ago

Hi Sachin,

Thanks for all the effort you've put in on these fixes. I'm attempting to run the script now, however, I'm getting an error:

Traceback (most recent call last): ... File "main.py", line 418, in main reviews_exist = navigate_to_reviews() File "main.py", line 341, in navigate_to_reviews "a.eiCell.cell.reviews") File "C:\Users\tqs966\AppData\Local\Continuum\anaconda3\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 598, in find_element_by_css_selector return self.find_element(by=By.CSS_SELECTOR, value=css_selector) File "C:\Users\tqs966\AppData\Local\Continuum\anaconda3\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 978, in find_element 'value': value})['value'] File "C:\Users\tqs966\AppData\Local\Continuum\anaconda3\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute self.error_handler.check_response(response) File "C:\Users\tqs966\AppData\Local\Continuum\anaconda3\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response raise exception_class(message, screen, stacktrace) selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":"a.eiCell.cell.reviews"} (Session info: headless chrome=78.0.3904.70)

on what appears to be ~line 340 under the navigate_to_reviews function. I've been unable to resolve the error tinkering around with it, would you be able to take a look at you code and advise if you've got some time?

Thank you!

sachinchaturvedi93 commented 4 years ago

Hi Galgliano,

reviews_cell =
browser.find_element_by_xpath("//*[@id="EIProductHeaders"]/div/a[2]/span[2]")
sachinchaturvedi93 commented 4 years ago

Closed Pull request by Mistake.

sachinchaturvedi93 commented 4 years ago

Adding new updates