flathunters / flathunter

A bot to help people with their rental real-estate search. 🏠🤖
GNU Affero General Public License v3.0
831 stars 179 forks source link

Immoscout: List index out of range? #144

Closed iwasherefirst2 closed 2 years ago

iwasherefirst2 commented 2 years ago

I have a daily cronjob that runs for 12 hours to execute flathunter. I found this in my log file for yesterday

Traceback (most recent call last):
  File "/home/adam/flathunter/flathunt.py", line 95, in <module>
    main()
  File "/home/adam/flathunter/flathunt.py", line 92, in main
    launch_flat_hunt(config)
  File "/home/adam/flathunter/flathunt.py", line 51, in launch_flat_hunt
    hunter.hunt_flats()
  File "/home/adam/flathunter/flathunter/hunter.py", line 42, in hunt_flats
    for expose in processor_chain.process(self.crawl_for_exposes(max_pages)):
  File "/home/adam/flathunter/flathunter/hunter.py", line 22, in crawl_for_exposes
    for searcher in self.config.searchers()
  File "/home/adam/flathunter/flathunter/hunter.py", line 23, in <listcomp>
    for url in self.config.get('urls', list())])
  File "/home/adam/flathunter/flathunter/abstract_crawler.py", line 142, in crawl
    return self.get_results(url, max_pages)
  File "/home/adam/flathunter/flathunter/crawl_immobilienscout.py", line 60, in get_results
    soup = self.get_page(search_url, self.driver, page_no)
  File "/home/adam/flathunter/flathunter/crawl_immobilienscout.py", line 120, in get_page
    return self.get_soup_from_url(search_url.format(page_no), driver=driver, captcha_api_key=self.captcha_api_key, checkbox=self.checkbox, afterlogin_string=self.afterlogin_string)
  File "/home/adam/flathunter/flathunter/abstract_crawler.py", line 79, in get_soup_from_url
    self.resolvegeetest(driver, captcha_api_key)
  File "/home/adam/flathunter/flathunter/abstract_crawler.py", line 185, in resolvegeetest
    recaptcha_answer = recaptcha_answer.split("|", 1)[1]
IndexError: list index out of range

Does it kill the process, or is it just a log? And what does it mean?

Also, I tested the url for immoscout that I provided in the config and I noticed sometimes it can't load but it just shows "Sorry there is an error, try again later". Will such an accident be logged? Would be useful for me to know how often such things happen

step21 commented 2 years ago

This might be the same as this thread. https://github.com/flathunters/flathunter/issues/138 If you want more logging you can enable debug logging, but be prepared to get a lot of logs. If you just want to ensure the job continues running, I would just restart via supervisor or cronjob, and as you see it is logged (via the error) just no very nicely.

codders commented 2 years ago

This also looks like the report it #157 . I think there must be something wrong with the geetest / 2captcha implementation. But the user in #157 reports it fixed now.

iwasherefirst2 commented 2 years ago

awesome thank you