Open wangrunzu opened 5 years ago
I am suddenly getting the exact same exception error using chromedriver 73.0.3683.6 on Mac OS X 10.13.6. The code was working 100% perfectly a few weeks ago. I am looking into get_current_page()
as I'm curious if find_elements by class name or xpath might be the problem, but I am a total beginner with selenium. Hoping the author can help.
Thanks folks, I may have time to look at this in the coming week. But if you're able to figure it out and make a PR to fix, I'll merge it!
I'm seeing the exact same error as above. It would be great if this can be resolved.
Hi, Is this resolved ?
Replacing some line of codes helped me.
Original (3 places in the codes): paging_control = browser.find_element_by_class_name('pagingControls') Updated: paging_control = browser.find_element_by_css_selector('.eiReviewsEIReviewsPageContainerStylespagination.noTabover.mt')
Original (2 places in the codes): next_ = paging_control.find_element_by_classname('next') Updated: next = paging_control.find_element_by_class_name('paginationPaginationStylenext')
Hey, so does anyone have an issue where they fix the paging_control options but it breaks later on? I'm trying to scrape around 30k worth of data. And the code keeps breaking for me on around p176. I used the following for paging_control
` def more_pages(): paging_control = browser.find_element_by_cssselector('.eiReviewsEIReviewsPageContainerStylespagination.noTabover.mt') next = paging_control.find_element_by_classname('paginationPaginationStylenext') try: next.find_element_by_tag_name('a') return True except selenium.common.exceptions.NoSuchElementException: return False
def go_to_next_page(): logger.info(f'Going to page {page[0] + 1}') paging_control = browser.find_element_by_classname('paginationPaginationStylepagination') next = paging_control.find_element_by_class_name( 'paginationPaginationStylenext').find_element_by_tagname('a') browser.get(next.get_attribute('href')) time.sleep(1) page[0] = page[0] + 1
`
I'm messing around with both to see what works but my code keeps breaking not even a quarter way through the scraping. Does anyone have a work around?
Hi all I've tried both suggestions and still the code breaks.
Any clue?
Traceback below:
Traceback (most recent call last):
File "main.py", line 483, in
Getting the latest code form MuhammadMehran pull request fixed the issue.
@carlotorniai Could you post the code by any chance? I have been trying to fix the same issue as well. Thanks
@EdiLacic123 just grab the main.py, test.py and schema. py from this pull request: https://github.com/MatthewChatham/glassdoor-review-scraper/pull/37/files
I got the following error about the paging control when I try to scrap the data.
python.exe main.py --headless --url "https://www.glassdoor.com/Reviews/Walmart-Reviews-E715.htm" --limit 100 -f test.csv
2019-05-31 15:06:49,643 INFO 377 :main.py(17796) - Configuring browser
DevTools listening on ws://127.0.0.1:50831/devtools/browser/8c7890e8-fe24-41f7-b77f-d22dae3f6c3e 2019-05-31 15:06:51,700 INFO 419 :main.py(17796) - Scraping up to 100 reviews. 2019-05-31 15:06:51,717 INFO 358 :main.py(17796) - Signing in to **@ou.edu 2019-05-31 15:06:55,478 INFO 339 :main.py(17796) - Navigating to company reviews 2019-05-31 15:07:08,137 INFO 286 :main.py(17796) - Extracting reviews from page 1 2019-05-31 15:07:08,200 INFO 291 :main.py(17796) - Found 10 reviews on page 1 2019-05-31 15:07:08,677 INFO 297 :main.py(17796) - Scraped data for "The Best in Retail"(Thu May 30 2019 20:24:44 GMT-0500 (Central Daylight Time)) 2019-05-31 15:07:09,171 INFO 297 :main.py(17796) - Scraped data for "Walmart needs to bring worker dignity back into focus"(Wed May 29 2019 18:04:43 GMT-0500 (Central Daylight Time)) 2019-05-31 15:07:09,673 INFO 297 :main.py(17796) - Scraped data for "Great for college students"(Thu May 30 2019 12:25:57 GMT-0500 (Central Daylight Time)) 2019-05-31 15:07:10,042 INFO 297 :main.py(17796) - Scraped data for "Retail"(Thu May 30 2019 17:09:02 GMT-0500 (Central Daylight Time)) 2019-05-31 15:07:10,497 INFO 297 :main.py(17796) - Scraped data for "walmart"(Mon May 27 2019 17:17:41 GMT-0500 (Central Daylight Time)) 2019-05-31 15:07:10,966 INFO 297 :main.py(17796) - Scraped data for "Maintenance is well taken care of"(Tue May 28 2019 08:32:17 GMT-0500 (Central Daylight Time)) 2019-05-31 15:07:11,437 INFO 297 :main.py(17796) - Scraped data for "It was the best job that I had to be honest"(Wed May 29 2019 20:29:39 GMT-0500 (Central Daylight Time)) 2019-05-31 15:07:11,896 INFO 297 :main.py(17796) - Scraped data for "Great"(Wed May 29 2019 20:36:02 GMT-0500 (Central Daylight Time)) 2019-05-31 15:07:12,281 INFO 297 :main.py(17796) - Scraped data for "floater pharmacist"(Wed May 29 2019 21:10:58 GMT-0500 (Central Daylight Time)) 2019-05-31 15:07:12,708 INFO 297 :main.py(17796) - Scraped data for "cashier"(Wed May 29 2019 23:11:49 GMT-0500 (Central Daylight Time)) Traceback (most recent call last): File "main.py", line 461, in
main()
File "main.py", line 446, in main
while more_pages() and\
File "main.py", line 314, in more_pages
paging_control = browser.find_element_by_class_name('pagingControls')
File "C:\Users\wang0040\AppData\Local\Continuum\miniconda3\envs\Default\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 564, in find_element_by_class_name
return self.find_element(by=By.CLASS_NAME, value=name)
File "C:\Users\wang0040\AppData\Local\Continuum\miniconda3\envs\Default\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 978, in find_element
'value': value})['value']
File "C:\Users\wang0040\AppData\Local\Continuum\miniconda3\envs\Default\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "C:\Users\wang0040\AppData\Local\Continuum\miniconda3\envs\Default\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line
242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"class name","selector":"pagingControls"}
(Session info: headless chrome=74.0.3729.169)
(Driver info: chromedriver=74.0.3729.6 (255758eccf3d244491b8a1317aa76e1ce10d57e9-refs/branch-heads/3729@{#29}),platform=Windows NT 6.1.7601 SP1 x86_64)
I also got the No Such Element Exception #8 error, but overcoming it by hide the scrape_years part. I do not think this action cause the above issue but I am not sure.