flathunters / flathunter

A bot to help people with their rental real-estate search. 🏠🤖
GNU Affero General Public License v3.0
852 stars 182 forks source link

Immoscout24 issue - TypeError when use 2capcha #321

Closed miklyx closed 1 year ago

miklyx commented 1 year ago

Error is: File ".../flathunt/flathunter/flathunter/chrome_wrapper.py", line 44, in get_chrome_driver driver = uc.Chrome(version_main=chrome_version, options=chrome_options) # pylint: disable=no-member TypeError: __init__() got an unexpected keyword argument 'version_main' undetected-chromedriver==3.2.1

Without 2capcha it is always: [2023/02/21 22:07:13|crawl_immobilienscout.py|ERROR ]: Index error occurred in log

codders commented 1 year ago

Hi @miklyx ,

It's weird that you have the undetected-chromedriver version 3.2.1 there. The version in the Pipfile right now is 3.4.6. Can you try fetching the latest code from main and running pipenv install again?

miklyx commented 1 year ago

Hi!

  1. Successfully uninstalled undetected-chromedriver-2.1.1 Successfully installed undetected-chromedriver-3.4.6

  2. Also I've changed at chrome-wrapper.py import undetected_chromedriver.v2 as uc to import undetected_chromedriver as uc because after upgrade I've got No module named 'undetected_chromedriver.v2'

  3. Changed config.yaml - aligned spaces in captcha block

  4. Now it started, but with immoscout24 : [2023/02/22 12:57:45|chrome_wrapper.py |INFO ]: Initializing Chrome WebDriver for crawler... [2023/02/22 12:57:47|patcher.py |INFO ]: patching driver executable /Users/mrashkovskiy/Library/Application Support/undetected_chromedriver/undetected_chromedriver [2023/02/22 12:58:16|__init__.py |INFO ]: setting properties for headless [2023/02/22 12:58:17|crawl_immobilienscout.py|DEBUG ]: Got search URL https://www.immobilienscout24.de/Suche/de/berlin/berlin/wohnung-mieten?price=-1100.0&livingspace=50.0-&exclusioncriteria=projectlisting,swapflat&pricetype=rentpermonth&geocodes=110000000801,110000000307,110000000703,110000000101,110000000201,110000000102,110000000202,110000000301,110000000104,110000000105,110000000106,110000000701&sorting=2&enteredFrom=result_list&pagenumber={0} [2023/02/22 12:58:30|abstract_crawler.py |INFO ]: Timeout waiting for iframe element - no captcha verification necessary? [2023/02/22 12:58:30|crawl_immobilienscout.py|WARNING ]: Unable to find IS24 variable in window [2023/02/22 12:58:30|crawl_immobilienscout.py|ERROR ]: IS24 bot detection has identified our script as a bot - we've been blocked

miklyx commented 1 year ago

Next it's very interesting behaviour: It sent a couple of links from immoscout24 and terminated Error sending media group: {...} Traceback (most recent call last): File "/Users/mrashkovskiy/Documents/py/flathunt/flathunter/flathunt.py", line 109, in <module> main() File "/Users/mrashkovskiy/Documents/py/flathunt/flathunter/flathunt.py", line 105, in main launch_flat_hunt(config, heartbeat) File "/Users/mrashkovskiy/Documents/py/flathunt/flathunter/flathunt.py", line 36, in launch_flat_hunt hunter.hunt_flats() File "/Users/mrashkovskiy/Documents/py/flathunt/flathunter/flathunter/hunter.py", line 54, in hunt_flats for expose in processor_chain.process(self.crawl_for_exposes(max_pages)): File "/Users/mrashkovskiy/Documents/py/flathunt/flathunter/flathunter/sender_telegram.py", line 37, in process_expose self.__broadcast( File "/Users/mrashkovskiy/Documents/py/flathunt/flathunter/flathunter/sender_telegram.py", line 61, in __broadcast self.__send_images(chat_id=receiver, msg=msg, images=images) File "/Users/mrashkovskiy/Documents/py/flathunt/flathunter/flathunter/sender_telegram.py", line 124, in __send_images self.__handle_error( File "/Users/mrashkovskiy/Documents/py/flathunt/flathunter/flathunter/sender_telegram.py", line 159, in __handle_error raise StampedeProtectionException( flathunter.exceptions.StampedeProtectionException: Too many messages too fast - backoff 12 seconds [2023/02/22 13:10:13|__init__.py |INFO ]: ensuring close

codders commented 1 year ago

That looks like it's working. The first run, you might get that message because it finds a lot of matches at the start. Afterwards it should calm down.

miklyx @.***> schrieb am Mi., 22. Feb. 2023, 13:14:

Next it's very interesting behaviour: It sent a couple of links from immoscout24 and terminated Error sending media group: {...} Traceback (most recent call last): File "/Users/mrashkovskiy/Documents/py/flathunt/flathunter/flathunt.py", line 109, in main() File "/Users/mrashkovskiy/Documents/py/flathunt/flathunter/flathunt.py", line 105, in main launch_flat_hunt(config, heartbeat) File "/Users/mrashkovskiy/Documents/py/flathunt/flathunter/flathunt.py", line 36, in launch_flat_hunt hunter.hunt_flats() File "/Users/mrashkovskiy/Documents/py/flathunt/flathunter/flathunter/hunter.py", line 54, in hunt_flats for expose in processor_chain.process(self.crawl_for_exposes(max_pages)): File "/Users/mrashkovskiy/Documents/py/flathunt/flathunter/flathunter/sender_telegram.py", line 37, in process_expose self.broadcast( File "/Users/mrashkovskiy/Documents/py/flathunt/flathunter/flathunter/sender_telegram.py", line 61, in broadcast self.send_images(chat_id=receiver, msg=msg, images=images) File "/Users/mrashkovskiy/Documents/py/flathunt/flathunter/flathunter/sender_telegram.py", line 124, in send_images self.handle_error( File "/Users/mrashkovskiy/Documents/py/flathunt/flathunter/flathunter/sender_telegram.py", line 159, in handle_error raise StampedeProtectionException( flathunter.exceptions.StampedeProtectionException: Too many messages too fast - backoff 12 seconds [2023/02/22 13:10:13|init.py |INFO ]: ensuring close

— Reply to this email directly, view it on GitHub https://github.com/flathunters/flathunter/issues/321#issuecomment-1439921796, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAEK5QKDJI2ADCLZ7BZMUDWYX7LNANCNFSM6AAAAAAVDQXDUA . You are receiving this because you commented.Message ID: @.***>

codders commented 1 year ago

If you had to update the code to remove the v2, you have probably not been running the latest code...

miklyx @.***> schrieb am Mi., 22. Feb. 2023, 13:07:

Hi!

1.

Successfully uninstalled undetected-chromedriver-2.1.1 Successfully installed undetected-chromedriver-3.4.6 2.

Also I've changed at chrome-wrapper.py

import undetected_chromedriver.v2 as uc to import

undetected_chromedriver as uc because after upgrade I've got No module named 'undetected_chromedriver.v2' 3.

Changed config.yaml - aligned spaces in captcha block 4.

Now it started, but with immoscout24 : [2023/02/22 12:57:45|chrome_wrapper.py |INFO ]: Initializing Chrome WebDriver for crawler... [2023/02/22 12:57:47|patcher.py |INFO ]: patching driver executable /Users/mrashkovskiy/Library/Application Support/undetected_chromedriver/undetected_chromedriver [2023/02/22 12:58:16|init.py |INFO ]: setting properties for headless [2023/02/22 12:58:17|crawl_immobilienscout.py|DEBUG ]: Got search URL https://www.immobilienscout24.de/Suche/de/berlin/berlin/wohnung-mieten?price=-1100.0&livingspace=50.0-&exclusioncriteria=projectlisting,swapflat&pricetype=rentpermonth&geocodes=110000000801,110000000307,110000000703,110000000101,110000000201,110000000102,110000000202,110000000301,110000000104,110000000105,110000000106,110000000701&sorting=2&enteredFrom=result_list&pagenumber={0} [2023/02/22 12:58:30|abstract_crawler.py |INFO ]: Timeout waiting for iframe element - no captcha verification necessary? [2023/02/22 12:58:30|crawl_immobilienscout.py|WARNING ]: Unable to find IS24 variable in window [2023/02/22 12:58:30|crawl_immobilienscout.py|ERROR ]: IS24 bot detection has identified our script as a bot - we've been blocked

— Reply to this email directly, view it on GitHub https://github.com/flathunters/flathunter/issues/321#issuecomment-1439913010, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAEK5UK7DDMSVH53SWBVIDWYX6PLANCNFSM6AAAAAAVDQXDUA . You are receiving this because you commented.Message ID: @.***>

miklyx commented 1 year ago

I've restarted it twice after same termination and now it kinda stable. Thank you!