flathunters / flathunter

A bot to help people with their rental real-estate search. 🏠🤖
GNU Affero General Public License v3.0
856 stars 183 forks source link

Issue with captcha/bot detection #296

Closed 4symmetry19 closed 1 year ago

4symmetry19 commented 1 year ago

Hi,

first of all, thanks for this great project! I'm running this on a mac in the local shell using Python 3.11. I configured everything for IS24, incl. 2captcha and the Telegram bot.

When I run flathunter.py though, I get output the first time; when it tries again after 10min, it is apparently detected as a bot. Note: I turned off "headless" as that wasn't working at all; at least with that off it gets me the first batch of results.

This is the outut I get after a 2nd run (verbose mode): [2023/01/21 13:22:59|abstract_crawler.py |INFO ]: Timeout waiting for iframe element - no captcha verification necessary? [2023/01/21 13:22:59|crawl_immobilienscout.py|WARNING ]: Unable to find IS24 variable in window [2023/01/21 13:22:59|crawl_immobilienscout.py|ERROR ]: IS24 bot detection has identified our script as a bot - we've been blocked [2023/01/21 13:22:59|crawl_immobilienscout.py|DEBUG ]: Got search URL https://www.immobilienscout24.de/Suche/de/[confidential but seems normal] [2023/01/21 13:23:09|abstract_crawler.py |INFO ]: Timeout waiting for iframe element - no captcha verification necessary? [2023/01/21 13:23:09|crawl_immobilienscout.py|WARNING ]: Unable to find IS24 variable in window [2023/01/21 13:23:09|crawl_immobilienscout.py|ERROR ]: IS24 bot detection has identified our script as a bot - we've been blocked [2023/01/21 13:23:09|crawl_immobilienscout.py|DEBUG ]: Got search URL https://www.immobilienscout24.de/Suche/radius/wohnung-mieten?[confidential but seems normal] [2023/01/21 13:23:20|abstract_crawler.py |INFO ]: Timeout waiting for iframe element - no captcha verification necessary? [2023/01/21 13:23:20|crawl_immobilienscout.py|WARNING ]: Unable to find IS24 variable in window [2023/01/21 13:23:20|crawl_immobilienscout.py|ERROR ]: IS24 bot detection has identified our script as a bot - we've been blocked

Another thing that stands out to me is that acc. to 2captcha.com, I've only used 1 captcha so far. For a very long time, the use count was even at 0 despite me getting that first batch of results. The API code is correct though.

Any help would be appreciated!

Cheers, asymmetry

codders commented 1 year ago

Hey there,

You'll probably need to provide a few more arguments to the chrome driver. From the looks of your output, you might be hitting the bot detection. Try:

captcha:
  2captcha:
    api_key: 0...00
  driver_arguments:
    - "--no-sandbox"
    - "--headless"
    - "--disable-gpu"
    - "--remote-debugging-port=9222"
    - "--disable-dev-shm-usage"
    - "window-size=1024,768"
4ndrew commented 1 year ago

Got the same issue with IS24, tried to add driver arguments but with no luck. I use flathunder with docker...

UPD: installed on mac instead of linux -- with headless -- all the same. Without headless it works.

codders commented 1 year ago

I just deployed it to a PC in the cloud without docker, and it all works (with --headless), so I think it's maybe something about IP ranges or some other property that is triggering the bot detection.

codders commented 1 year ago

We've upgraded the undetected-chrome support in the latest software. Are you still seeing this issue with the newest version?

infctr commented 1 year ago

I've updated to latest build and crawling IS24 still doesn't work for me on Google Cloud Deployment

4symmetry19 commented 1 year ago

We've upgraded the undetected-chrome support in the latest software. Are you still seeing this issue with the newest version?

It started working soon after I posted, so I guess you fixed it! Thanks so much :)

codders commented 1 year ago

Great to hear - thanks for the report!