FeLoe / DataDonations2021

Material for Organising a Mobile Lab
1 stars 0 forks source link

whatsapp scraper failes (on my data) after running for 10 min #44

Open damian0604 opened 3 years ago

damian0604 commented 3 years ago
env) damian@damian-thinkpad:~/onderzoek-github/Lab2020/parse_scripts$ ./whatsapp_scrape.py 
Traceback (most recent call last):
  File "./whatsapp_scrape.py", line 188, in <module>
    links_per_chat = myscraper.scrape_links()
  File "./whatsapp_scrape.py", line 76, in scrape_links
    c[0].click()
  File "/home/damian/onderzoek-github/Lab2020/env/lib/python3.8/site-packages/selenium/webdriver/remote/webelement.py", line 80, in click
    self._execute(Command.CLICK_ELEMENT)
  File "/home/damian/onderzoek-github/Lab2020/env/lib/python3.8/site-packages/selenium/webdriver/remote/webelement.py", line 633, in _execute
    return self._parent.execute(command, params)
  File "/home/damian/onderzoek-github/Lab2020/env/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
    self.error_handler.check_response(response)
  File "/home/damian/onderzoek-github/Lab2020/env/lib/python3.8/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: Web element reference not seen before: {"element-6066-11e4-a52e-4f735466cecf":"abf38850-332d-4803-a6c3-dba2f285fd4a"}

TODO: I will change the function to yield instead of return the elements,think of some error handling, and immediately dump to jsonlines instead of one big json at the end so that we do not loose all the data we already got.

FeLoe commented 3 years ago

The error when the scraper fails indicates that it scrolled too far/not enough (had that a lot when figuring out timestamp issues) or that it went too fast and the page wasn't loaded yet. Maybe you can run it again to check when exactly it fails so we can see what the reason was?