Open iopsych opened 5 years ago
Same here ... I guess Glassdoor changed parts of the CSS structure...
same here
Can some one help?
same issue :-|
hi, just change 'reviewbodycell' to 'hreview' that's the new class name they're using. Alongside this issue you may have trouble with the navigation buttons. I'm just tinkering with Matt's code to see if I can get everything updated. I'll post when I'm done. If someone is more speedy, please share.
Admittedly - I know nothing about Python, but I spent some time with it and got it all working. It required quite a bit of changes to the paging, the show more button, and a few other things using Xpath and Css selector code.
I'm also a big newbie to using Github. But I uploaded the code under my repository. Apologies if this wasn't the right way to do this.
@iopsych do you find that your working code fucks up at any point? I sometimes have an issue where the scraper can run for about 15,000 reviews before I get a no such element exception. Even though, I've revised the paging, etc. and it was working before
@tsp2123 - I haven't run it for any large companies with so many ratings. I will give it a shot and see how it goes.
@iopsych thanks for the code! it works like a charm! last day at my job and possibly the last thing they asked from me and i delivered!!!
@iopsych THANK YOU you are a lifesaver!! @tsp2123 I will keep this thread posted if I run into the issue you are having.
@iopsych Thank you for fixing and posting the revised code. It works well except when it encounters the issue identified by @tsp2123 where scrapes on firms with larger numbers of reviews will capture hundreds of pages of reviews successfully and then terminate without saving after hitting a no such element error. The error results from Glassdoor's Cloudflare DDoS protection popping up a dialog with a "Click here to reload." prompt which the script isn't programmed to recognize and respond to.
Hi there...
I am also getting a no such element exception. Mine is slightly different than what NKoenig06 reported in that it is "Unable to locate element: {"method": "css selector", "selector":".reviewBodyCell"}
I've tinkered around but can't seem to fix it.
2019-06-27 14:09:01,138 INFO 377 :main.py(2268) - Configuring browser 2019-06-27 14:09:03,281 INFO 419 :main.py(2268) - Scraping up to 100 reviews. 2019-06-27 14:09:03,289 INFO 358 :main.py(2268) - Signing in to toxipator@gmail.com 2019-06-27 14:09:07,250 INFO 339 :main.py(2268) - Navigating to company reviews 2019-06-27 14:09:19,008 INFO 286 :main.py(2268) - Extracting reviews from page 1 2019-06-27 14:09:19,028 INFO 291 :main.py(2268) - Found 9 reviews on page 1 2019-06-27 14:09:19,042 INFO 300 :main.py(2268) - Discarding a featured review Traceback (most recent call last):
File "C:\Users\GBarnett\main.py", line 461, in
main()
File "C:\Users\GBarnett\main.py", line 441, in main reviews_df = extract_from_page()
File "C:\Users\GBarnett\main.py", line 295, in extract_from_page data = extract_review(review)
File "C:\Users\GBarnett\main.py", line 281, in extract_review res[field] = scrape(field, review, author)
File "C:\Users\GBarnett\main.py", line 264, in scrape return fdictfield
File "C:\Users\GBarnett\main.py", line 156, in scrape_years 'reviewBodyCell').find_element_by_tag_name('p')
File "C:\Users\GBarnett\AppData\Local\Continuum\anaconda3\lib\site-packages\selenium\webdriver\remote\webelement.py", line 398, in find_element_by_class_name return self.find_element(by=By.CLASS_NAME, value=name)
File "C:\Users\GBarnett\AppData\Local\Continuum\anaconda3\lib\site-packages\selenium\webdriver\remote\webelement.py", line 659, in find_element {"using": by, "value": value})['value']
File "C:\Users\GBarnett\AppData\Local\Continuum\anaconda3\lib\site-packages\selenium\webdriver\remote\webelement.py", line 633, in _execute return self._parent.execute(command, params)
File "C:\Users\GBarnett\AppData\Local\Continuum\anaconda3\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute self.error_handler.check_response(response)
File "C:\Users\GBarnett\AppData\Local\Continuum\anaconda3\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response raise exception_class(message, screen, stacktrace)
NoSuchElementException: no such element: Unable to locate element: {"method":"css selector","selector":".reviewBodyCell"} (Session info: headless chrome=75.0.3770.100)