flathunters / flathunter

A bot to help people with their rental real-estate search. 🏠🤖
GNU Affero General Public License v3.0
834 stars 179 forks source link

no_of_results although 2captcha setup #92

Closed lomoien closed 2 years ago

lomoien commented 3 years ago

Hello, First, thanks to all the amazing people contributing to this project!

I always got the problem when crawling ImmobilienScout24 that I get the no_of_results error.

Traceback (most recent call last): File "flathunt.py", line 89, in main() File "flathunt.py", line 86, in main launch_flat_hunt(config) File "flathunt.py", line 46, in launch_flat_hunt hunter.hunt_flats() File "/Users/zoe/Desktop/flathunter-main/flathunter/hunter.py", line 42, in hunt_flats for expose in processor_chain.process(self.crawl_for_exposes(max_pages)): File "/Users/zoe/Desktop/flathunter-main/flathunter/hunter.py", line 22, in crawl_for_exposes for searcher in self.config.searchers() File "/Users/zoe/Desktop/flathunter-main/flathunter/hunter.py", line 23, in for url in self.config.get('urls', list())]) File "/Users/zoe/Desktop/flathunter-main/flathunter/abstract_crawler.py", line 117, in crawl return self.get_results(url, max_pages) File "/Users/zoe/Desktop/flathunter-main/flathunter/crawl_immobilienscout.py", line 65, in get_results while len(entries) < min(no_of_results, self.RESULT_LIMIT) and \ UnboundLocalError: local variable 'no_of_results' referenced before assignment

So I tried to make the changes to the files like mentioned in the pull #61 But after changing the code, installing chromedriver and selenium and paying 2captcha, I still get the same error. Is there anything else I need to do? I added the 2captcha key and the chromedriver path to the config file. I don't know what else there is to do... Any help would be amazing.

pcace commented 3 years ago

Hey, i am having the same Problem, but it not seems to be every time.. This is what i get now - wich seems to be the same problem as yours:

But it was working correctly 2 days ago - with nothing changed... :/

user@mail:~/flathunter$ python3 flathunt.py 
[2020/11/15 17:50:53|config.py         |INFO    ]: Using config /home/user/flathunter/config.yaml
[2020/11/15 17:50:55|flathunt.py       |DEBUG   ]: Settings from config: <flathunter.config.Config object at 0x7f597dba3c88>
[2020/11/15 17:50:55|crawl_immobilienscout.py|DEBUG   ]: Got search URL https://www.immobilienscout24.de/Suche/shape/wohnung-mieten?shape=c3hwX0l1emdwQXp1QHtLbGlBX2hBcGNAbXtCbmdAX05_cUB2WHhoQGN5QWlAYXVCe3VAcVZrTX1pQGptQWVfQGlAX11jWWl5QmFuQHpLa01nbkB6ZkF3fUNlVX1wQ2RmQH1nQW1aeXZAe3VAbWNAbVpmQ3VwQG1qQ2VmQGxFZWZAdmdAbWdAYkFlZkBwX0JvRXBfQn5cclZwRXJlQHNhQW55Q2F3QmxFaV5sY0B1Tn5nQXFBcEdtWnRyQWt4QHhlQXRbaHlCZ1FgYkRqWmRmQ3BwQGZ7QWx4QHtLamlBdlhoUXtpQHRsQGFO&numberofrooms=2.0-&price=-750.0&livingspace=75.0-&enteredFrom=result_list#/&pagenumber={0}
[2020/11/15 17:50:57|abstract_crawler.py|DEBUG   ]: Google site key: None
Traceback (most recent call last):
  File "flathunt.py", line 89, in <module>
    main()
  File "flathunt.py", line 86, in main
    launch_flat_hunt(config)
  File "flathunt.py", line 46, in launch_flat_hunt
    hunter.hunt_flats()
  File "/home/user/flathunter/flathunter/hunter.py", line 42, in hunt_flats
    for expose in processor_chain.process(self.crawl_for_exposes(max_pages)):
  File "/home/user/flathunter/flathunter/hunter.py", line 22, in crawl_for_exposes
    for searcher in self.config.searchers()
  File "/home/user/flathunter/flathunter/hunter.py", line 23, in <listcomp>
    for url in self.config.get('urls', list())])
  File "/home/user/flathunter/flathunter/abstract_crawler.py", line 136, in crawl
    return self.get_results(url, max_pages)
  File "/home/user/flathunter/flathunter/crawl_immobilienscout.py", line 60, in get_results
    soup = self.get_page(search_url, self.driver, page_no)
  File "/home/user/flathunter/flathunter/crawl_immobilienscout.py", line 120, in get_page
    return self.get_soup_from_url(search_url.format(page_no), driver=driver, captcha_api_key=self.captcha_api_key, checkbox=self.checkbox, afterlogin_string=self.afterlogin_string)
  File "/home/user/flathunter/flathunter/abstract_crawler.py", line 75, in get_soup_from_url
    self.resolvecaptcha(driver, checkbox, afterlogin_string, captcha_api_key)
  File "/home/user/flathunter/flathunter/abstract_crawler.py", line 153, in resolvecaptcha
    self._solve(driver, api_key)
  File "/home/user/flathunter/flathunter/abstract_crawler.py", line 164, in _solve
    google_site_key = google_site_key_temp.group(1)
AttributeError: 'NoneType' object has no attribute 'group'
user@mail:~/flathunter$ git pull
Already up to date.

Any Idea how to solve it?

dave291 commented 3 years ago

Hey, looks like immoscout doesn't implement the captcha via an iframe anymore. This could solve the problem (worked for me):

See #94

class Crawler:
     def _solve(self, driver, api_key)
-           google_site_key_temp = re.search("data-sitekey=\"(.*?)\"", src_iframe)
-           src_iframe = driver.find_element_by_tag_name("iframe").get_attribute("src")
-           google_site_key = google_site_key_temp.group(1)
-           self.__log__.debug("Google site key: %s", google_site_key_temp)
+           google_site_key = driver.find_element_by_class_name("g-recaptcha").get_attribute("data-sitekey")
+           self.__log__.debug("Google site key: %s", google_site_key)
lomoien commented 3 years ago

Thank you. It worked now for less than one hour but I got the same error again now.. :/

codders commented 3 years ago

Seems to be working for me with the latest code from @dave291 . @pcace Have you tried with Dave's code? @lomoien what Immoscout URL are you using?

lomoien commented 3 years ago

@codders I got the following link running https://www.immobilienscout24.de/Suche/shape/wohnung-mieten?shape=YWFhZklnamB8QHRVZUV8Q0NsTWFCbEdpQnBZa05qQ3VCalR1S3BEeUN4RmJAclJRck1zSHZHdUd0RGdGYEdrTXZFbVNgR29eckFtWU9hdUBvQ3VTd0V3U3tJeVhpTmlUfVl9V2dFe0JnRWNBY1h5QHVLfkF3SmhFY0RmQ2lgQHdLcXRAaUt5Vj9rW2pBdVd_RXNKfkNnTHJGeUJ4RGlCfEZlQGxFRX5UXH5EY0JqRUh_VnZCfFVmRmxVYkBmQVNMWVZXVElOTVJHXD9SP04-WkBgQEBYRFpGVkZaRFhEYkBEYkBCaEBCaEBCbEA-ZkBAWEBYRFhKYkBGckBKeEBIfkBOekBKckBMeEBUYEFUZEFUakFUfkBSekBUeEBSckBMbkBKWERISkhQVlRgQFZmQFBiQExOSkRKP0o-Sj9QQlJEVE5cWFhCYEA-Xj9cP1w-Vj9SP1ZDUEFWR1ZFVElYRVhJXEVkQD9gQD9iQENcSWRAT2ZAS2ZATWJASWJAS15NXk9gQEtcU1ZNVE9OT05PTlNMS0ZPSFNKV0hbRFlEY0A-XT9jQD9jQD9VP1k-XUNbR2NARWdARVN0QnBBcFJVeEllQnVJbF5XekpiQGhedEB4TWxIcHNAdEBuRH5EeEpgRXREeEp6QnhIckN2S0I.&numberofrooms=2.0-&price=-1300.0&livingspace=70.0-&sorting=2&enteredFrom=result_list

The problem is that I don't really know if the 2captcha is working or not. But since new results come in a random timeframe when starting up the program (normally all the new results will show inside the console immediately after startup), I guess that it's working? Is there another way to check that? I will leave the program running and see how long it will work without any error.

lomoien commented 3 years ago

Again after ca. one hour, same error. :( And if I try to restart the program, I get the error immediately.

pcace commented 3 years ago

Again after ca. one hour, same error. :( And if I try to restart the program, I get the error immediately.

Jep, same problem here! :(

codders commented 2 years ago

I'm closing this because I see some other users having success with the 2captcha setup and I think more recent contributions have solved the issue. But please re-open it if it's still a problem.