Closed lomoien closed 2 years ago
Hey, i am having the same Problem, but it not seems to be every time.. This is what i get now - wich seems to be the same problem as yours:
But it was working correctly 2 days ago - with nothing changed... :/
user@mail:~/flathunter$ python3 flathunt.py
[2020/11/15 17:50:53|config.py |INFO ]: Using config /home/user/flathunter/config.yaml
[2020/11/15 17:50:55|flathunt.py |DEBUG ]: Settings from config: <flathunter.config.Config object at 0x7f597dba3c88>
[2020/11/15 17:50:55|crawl_immobilienscout.py|DEBUG ]: Got search URL https://www.immobilienscout24.de/Suche/shape/wohnung-mieten?shape=c3hwX0l1emdwQXp1QHtLbGlBX2hBcGNAbXtCbmdAX05_cUB2WHhoQGN5QWlAYXVCe3VAcVZrTX1pQGptQWVfQGlAX11jWWl5QmFuQHpLa01nbkB6ZkF3fUNlVX1wQ2RmQH1nQW1aeXZAe3VAbWNAbVpmQ3VwQG1qQ2VmQGxFZWZAdmdAbWdAYkFlZkBwX0JvRXBfQn5cclZwRXJlQHNhQW55Q2F3QmxFaV5sY0B1Tn5nQXFBcEdtWnRyQWt4QHhlQXRbaHlCZ1FgYkRqWmRmQ3BwQGZ7QWx4QHtLamlBdlhoUXtpQHRsQGFO&numberofrooms=2.0-&price=-750.0&livingspace=75.0-&enteredFrom=result_list#/&pagenumber={0}
[2020/11/15 17:50:57|abstract_crawler.py|DEBUG ]: Google site key: None
Traceback (most recent call last):
File "flathunt.py", line 89, in <module>
main()
File "flathunt.py", line 86, in main
launch_flat_hunt(config)
File "flathunt.py", line 46, in launch_flat_hunt
hunter.hunt_flats()
File "/home/user/flathunter/flathunter/hunter.py", line 42, in hunt_flats
for expose in processor_chain.process(self.crawl_for_exposes(max_pages)):
File "/home/user/flathunter/flathunter/hunter.py", line 22, in crawl_for_exposes
for searcher in self.config.searchers()
File "/home/user/flathunter/flathunter/hunter.py", line 23, in <listcomp>
for url in self.config.get('urls', list())])
File "/home/user/flathunter/flathunter/abstract_crawler.py", line 136, in crawl
return self.get_results(url, max_pages)
File "/home/user/flathunter/flathunter/crawl_immobilienscout.py", line 60, in get_results
soup = self.get_page(search_url, self.driver, page_no)
File "/home/user/flathunter/flathunter/crawl_immobilienscout.py", line 120, in get_page
return self.get_soup_from_url(search_url.format(page_no), driver=driver, captcha_api_key=self.captcha_api_key, checkbox=self.checkbox, afterlogin_string=self.afterlogin_string)
File "/home/user/flathunter/flathunter/abstract_crawler.py", line 75, in get_soup_from_url
self.resolvecaptcha(driver, checkbox, afterlogin_string, captcha_api_key)
File "/home/user/flathunter/flathunter/abstract_crawler.py", line 153, in resolvecaptcha
self._solve(driver, api_key)
File "/home/user/flathunter/flathunter/abstract_crawler.py", line 164, in _solve
google_site_key = google_site_key_temp.group(1)
AttributeError: 'NoneType' object has no attribute 'group'
user@mail:~/flathunter$ git pull
Already up to date.
Any Idea how to solve it?
Hey, looks like immoscout doesn't implement the captcha via an iframe anymore. This could solve the problem (worked for me):
See #94
class Crawler:
def _solve(self, driver, api_key)
- google_site_key_temp = re.search("data-sitekey=\"(.*?)\"", src_iframe)
- src_iframe = driver.find_element_by_tag_name("iframe").get_attribute("src")
- google_site_key = google_site_key_temp.group(1)
- self.__log__.debug("Google site key: %s", google_site_key_temp)
+ google_site_key = driver.find_element_by_class_name("g-recaptcha").get_attribute("data-sitekey")
+ self.__log__.debug("Google site key: %s", google_site_key)
Thank you. It worked now for less than one hour but I got the same error again now.. :/
Seems to be working for me with the latest code from @dave291 . @pcace Have you tried with Dave's code? @lomoien what Immoscout URL are you using?
@codders I got the following link running
https://www.immobilienscout24.de/Suche/shape/wohnung-mieten?shape=YWFhZklnamB8QHRVZUV8Q0NsTWFCbEdpQnBZa05qQ3VCalR1S3BEeUN4RmJAclJRck1zSHZHdUd0RGdGYEdrTXZFbVNgR29eckFtWU9hdUBvQ3VTd0V3U3tJeVhpTmlUfVl9V2dFe0JnRWNBY1h5QHVLfkF3SmhFY0RmQ2lgQHdLcXRAaUt5Vj9rW2pBdVd_RXNKfkNnTHJGeUJ4RGlCfEZlQGxFRX5UXH5EY0JqRUh_VnZCfFVmRmxVYkBmQVNMWVZXVElOTVJHXD9SP04-WkBgQEBYRFpGVkZaRFhEYkBEYkBCaEBCaEBCbEA-ZkBAWEBYRFhKYkBGckBKeEBIfkBOekBKckBMeEBUYEFUZEFUakFUfkBSekBUeEBSckBMbkBKWERISkhQVlRgQFZmQFBiQExOSkRKP0o-Sj9QQlJEVE5cWFhCYEA-Xj9cP1w-Vj9SP1ZDUEFWR1ZFVElYRVhJXEVkQD9gQD9iQENcSWRAT2ZAS2ZATWJASWJAS15NXk9gQEtcU1ZNVE9OT05PTlNMS0ZPSFNKV0hbRFlEY0A-XT9jQD9jQD9VP1k-XUNbR2NARWdARVN0QnBBcFJVeEllQnVJbF5XekpiQGhedEB4TWxIcHNAdEBuRH5EeEpgRXREeEp6QnhIckN2S0I.&numberofrooms=2.0-&price=-1300.0&livingspace=70.0-&sorting=2&enteredFrom=result_list
The problem is that I don't really know if the 2captcha is working or not. But since new results come in a random timeframe when starting up the program (normally all the new results will show inside the console immediately after startup), I guess that it's working? Is there another way to check that? I will leave the program running and see how long it will work without any error.
Again after ca. one hour, same error. :( And if I try to restart the program, I get the error immediately.
Again after ca. one hour, same error. :( And if I try to restart the program, I get the error immediately.
Jep, same problem here! :(
I'm closing this because I see some other users having success with the 2captcha setup and I think more recent contributions have solved the issue. But please re-open it if it's still a problem.
Hello, First, thanks to all the amazing people contributing to this project!
I always got the problem when crawling ImmobilienScout24 that I get the no_of_results error.
So I tried to make the changes to the files like mentioned in the pull #61 But after changing the code, installing chromedriver and selenium and paying 2captcha, I still get the same error. Is there anything else I need to do? I added the 2captcha key and the chromedriver path to the config file. I don't know what else there is to do... Any help would be amazing.