MatthewChatham / glassdoor-review-scraper

Scrape reviews from Glassdoor
BSD 2-Clause "Simplified" License
180 stars 252 forks source link

The scraper seems to only pull the top 10 reviews regardless of how many pages it attempts to navigate to #57

Open Drew-Smith-18 opened 3 years ago

Drew-Smith-18 commented 3 years ago

Does anyone know how to resolve this?

Drew-Smith-18 commented 3 years ago

Found a solution, it is pretty hacky but...replace go_to_next_page(): with below and it will work

def go_to_next_page(): logger.info(f'Going to page {page[0] + 1}')

  currentUrl = browser.current_url
  print(f'old url: {currentUrl}')

  currentUrl = currentUrl.split('.htm', 1)
  currentUrl = currentUrl[0].split('_')[0]
  currentUrl = [currentUrl]
  currentUrl.insert(1, f'_P{page[0] + 1}.htm')
  print(f'new url: {currentUrl}')

  browser.get(''.join(currentUrl))
  time.sleep(5) # wait for ads to load
  page[0] = page[0] + 1
roscoe777 commented 3 years ago

@Drew-Smith-18 May I know if this scraper is still working? I tried example 1 but kept getting errors as below. The process stops at landing on the first reviews page. I would very appreciate it if you could give some instructions. Thank you! Screenshot 2021-10-24 215142