flathunters / flathunter

A bot to help people with their rental real-estate search. 🏠🤖
GNU Affero General Public License v3.0
831 stars 179 forks source link

Error when adding the 2Captcha service #80

Closed rusagent closed 3 years ago

rusagent commented 3 years ago

I pulled freshly and added the 2Captcha Service but not i get this:

 File "flathunt.py", line 89, in <module>
    main()
  File "flathunt.py", line 86, in main
    launch_flat_hunt(config)
  File "flathunt.py", line 46, in launch_flat_hunt
    hunter.hunt_flats()
  File "/home/user/janhuntboi/flathunter/hunter.py", line 42, in hunt_flats
    for expose in processor_chain.process(self.crawl_for_exposes(max_pages)):
  File "/home/user/janhuntboi/flathunter/hunter.py", line 22, in crawl_for_exposes
    for searcher in self.config.searchers()
  File "/home/user/janhuntboi/flathunter/hunter.py", line 23, in <listcomp>
    for url in self.config.get('urls', list())])
  File "/home/user/janhuntboi/flathunter/abstract_crawler.py", line 136, in crawl
    return self.get_results(url, max_pages)
  File "/home/user/janhuntboi/flathunter/crawl_immobilienscout.py", line 63, in get_results
    return self.get_entries_from_javascript()
  File "/home/user/janhuntboi/flathunter/crawl_immobilienscout.py", line 105, in get_entries_from_javascript
    return [ self.extract_entry_from_javascript(entry) for entry in entry_list ]
  File "/home/user/janhuntboi/flathunter/crawl_immobilienscout.py", line 105, in <listcomp>
    return [ self.extract_entry_from_javascript(entry) for entry in entry_list ]
  File "/home/user/janhuntboi/flathunter/crawl_immobilienscout.py", line 111, in extract_entry_from_javascript
    'image': entry["resultlist.realEstate"]["galleryAttachments"]["attachment"][0]["@xlink.href"] if "galleryAttachments" in entry["resultlist.realEstate"] else "https://www.static-immobilienscout24.de/statpic/placeholder_house/496c95154de31a357afa978cdb7f15f0_placeholder_medium.png",
mordax7 commented 3 years ago

https://github.com/flathunters/flathunter/pull/81 just got merged. Can you please try fetching the last code stand and try again?

Elo1338 commented 3 years ago

Just tried it. Still the same error. I just installed it and it worked, when I used only one immowelt link. After pasting a few other links it stopped working with the same error message with an additional

KeyError: 0

at the end.

Looks like only Immobilienscout is the problem. I deleted those links and it worked again.

codders commented 3 years ago

Hi @Elo1338 ,

Can you share which URLs you are using in your config? The error above is specifically related to Immobilienscout - Immowelt shouldn't be running the Immobilienscout crawler.

Elo1338 commented 3 years ago

Hey @codders , Immowelt links worked fine, the Immobilienscout links are the problem. Sorry for the confusion.

rusagent commented 3 years ago

Any news?

codders commented 3 years ago

@rusagent Did you see what @mordax777 posted? There was some new code yesterday that might help. Does the latest work for you?

rusagent commented 3 years ago

Alright that works so far. Just started the bot though.

Would it be possible to use this addon somehow ? https://chrome.google.com/webstore/detail/buster-captcha-solver-for/mpbjkejclgfgadiemmefgebjfooflfhl?hl=en

rusagent commented 3 years ago

Well i still get an error. But this happens even if i comment out the captcha config...


Traceback (most recent call last):
  File "flathunt.py", line 89, in <module>
    main()
  File "flathunt.py", line 86, in main
    launch_flat_hunt(config)
  File "flathunt.py", line 46, in launch_flat_hunt
    hunter.hunt_flats()
  File "/home/user/janhuntboi/flathunter/hunter.py", line 42, in hunt_flats
    for expose in processor_chain.process(self.crawl_for_exposes(max_pages)):
  File "/home/user/janhuntboi/flathunter/hunter.py", line 22, in crawl_for_exposes
    for searcher in self.config.searchers()
  File "/home/user/janhuntboi/flathunter/hunter.py", line 23, in <listcomp>
    for url in self.config.get('urls', list())])
  File "/home/user/janhuntboi/flathunter/abstract_crawler.py", line 136, in crawl
    return self.get_results(url, max_pages)
  File "/home/user/janhuntboi/flathunter/crawl_immobilienscout.py", line 63, in get_results
    return self.get_entries_from_javascript()
  File "/home/user/janhuntboi/flathunter/crawl_immobilienscout.py", line 105, in get_entries_from_javascript
    return [ self.extract_entry_from_javascript(entry) for entry in entry_list ]
  File "/home/user/janhuntboi/flathunter/crawl_immobilienscout.py", line 105, in <listcomp>
    return [ self.extract_entry_from_javascript(entry) for entry in entry_list ]
  File "/home/user/janhuntboi/flathunter/crawl_immobilienscout.py", line 111, in extract_entry_from_javascript
    'image': entry["resultlist.realEstate"]["galleryAttachments"]["attachment"][0]["@xlink.href"] if "galleryAttachments" in entry["resultlist.realEstate"] else "https://www.static-immobilienscout24.de/statpic/placeholder_house/496c95154de31a357afa978cdb7f15f0_placeholder_medium.png",
KeyError: 0
codders commented 3 years ago

@rusagent What URL are you using on ImmoScout? I can try and reproduce the issue here.

rusagent commented 3 years ago

urls:

codders commented 3 years ago

I'm taking a look at this this morning. The problem you are seeing is because the crawler is expecting a browse page and not a search page. But I can try and make the crawler more general. Not sure if I get that done today - maybe over the weekend.

Elo1338 commented 3 years ago

Sorry for the dumb question but what exactly is the difference between a browse page and a search page and how do I access a browse page in Immobilienscout?

rusagent commented 3 years ago

Good question - i have no answer :(

codders commented 3 years ago

@rusagent Okay. I think I fixed that in #84 . Can you pull the latest code and retry?

rusagent commented 3 years ago

Seems like it is working (Y) Thanks man!

codders commented 3 years ago

Cool. Good to know.