Closed miklyx closed 1 year ago
Hi @miklyx ,
It's weird that you have the undetected-chromedriver version 3.2.1 there. The version in the Pipfile right now is 3.4.6. Can you try fetching the latest code from main
and running pipenv install
again?
Hi!
Successfully uninstalled undetected-chromedriver-2.1.1 Successfully installed undetected-chromedriver-3.4.6
Also I've changed at chrome-wrapper.py
import undetected_chromedriver.v2 as uc
to import undetected_chromedriver as uc
because after upgrade I've got
No module named 'undetected_chromedriver.v2'
Changed config.yaml - aligned spaces in captcha
block
Now it started, but with immoscout24 :
[2023/02/22 12:57:45|chrome_wrapper.py |INFO ]: Initializing Chrome WebDriver for crawler... [2023/02/22 12:57:47|patcher.py |INFO ]: patching driver executable /Users/mrashkovskiy/Library/Application Support/undetected_chromedriver/undetected_chromedriver [2023/02/22 12:58:16|__init__.py |INFO ]: setting properties for headless [2023/02/22 12:58:17|crawl_immobilienscout.py|DEBUG ]: Got search URL https://www.immobilienscout24.de/Suche/de/berlin/berlin/wohnung-mieten?price=-1100.0&livingspace=50.0-&exclusioncriteria=projectlisting,swapflat&pricetype=rentpermonth&geocodes=110000000801,110000000307,110000000703,110000000101,110000000201,110000000102,110000000202,110000000301,110000000104,110000000105,110000000106,110000000701&sorting=2&enteredFrom=result_list&pagenumber={0} [2023/02/22 12:58:30|abstract_crawler.py |INFO ]: Timeout waiting for iframe element - no captcha verification necessary? [2023/02/22 12:58:30|crawl_immobilienscout.py|WARNING ]: Unable to find IS24 variable in window [2023/02/22 12:58:30|crawl_immobilienscout.py|ERROR ]: IS24 bot detection has identified our script as a bot - we've been blocked
Next it's very interesting behaviour:
It sent a couple of links from immoscout24 and terminated
Error sending media group: {...} Traceback (most recent call last): File "/Users/mrashkovskiy/Documents/py/flathunt/flathunter/flathunt.py", line 109, in <module> main() File "/Users/mrashkovskiy/Documents/py/flathunt/flathunter/flathunt.py", line 105, in main launch_flat_hunt(config, heartbeat) File "/Users/mrashkovskiy/Documents/py/flathunt/flathunter/flathunt.py", line 36, in launch_flat_hunt hunter.hunt_flats() File "/Users/mrashkovskiy/Documents/py/flathunt/flathunter/flathunter/hunter.py", line 54, in hunt_flats for expose in processor_chain.process(self.crawl_for_exposes(max_pages)): File "/Users/mrashkovskiy/Documents/py/flathunt/flathunter/flathunter/sender_telegram.py", line 37, in process_expose self.__broadcast( File "/Users/mrashkovskiy/Documents/py/flathunt/flathunter/flathunter/sender_telegram.py", line 61, in __broadcast self.__send_images(chat_id=receiver, msg=msg, images=images) File "/Users/mrashkovskiy/Documents/py/flathunt/flathunter/flathunter/sender_telegram.py", line 124, in __send_images self.__handle_error( File "/Users/mrashkovskiy/Documents/py/flathunt/flathunter/flathunter/sender_telegram.py", line 159, in __handle_error raise StampedeProtectionException( flathunter.exceptions.StampedeProtectionException: Too many messages too fast - backoff 12 seconds [2023/02/22 13:10:13|__init__.py |INFO ]: ensuring close
That looks like it's working. The first run, you might get that message because it finds a lot of matches at the start. Afterwards it should calm down.
miklyx @.***> schrieb am Mi., 22. Feb. 2023, 13:14:
Next it's very interesting behaviour: It sent a couple of links from immoscout24 and terminated Error sending media group: {...} Traceback (most recent call last): File "/Users/mrashkovskiy/Documents/py/flathunt/flathunter/flathunt.py", line 109, in
main() File "/Users/mrashkovskiy/Documents/py/flathunt/flathunter/flathunt.py", line 105, in main launch_flat_hunt(config, heartbeat) File "/Users/mrashkovskiy/Documents/py/flathunt/flathunter/flathunt.py", line 36, in launch_flat_hunt hunter.hunt_flats() File "/Users/mrashkovskiy/Documents/py/flathunt/flathunter/flathunter/hunter.py", line 54, in hunt_flats for expose in processor_chain.process(self.crawl_for_exposes(max_pages)): File "/Users/mrashkovskiy/Documents/py/flathunt/flathunter/flathunter/sender_telegram.py", line 37, in process_expose self.broadcast( File "/Users/mrashkovskiy/Documents/py/flathunt/flathunter/flathunter/sender_telegram.py", line 61, in broadcast self.send_images(chat_id=receiver, msg=msg, images=images) File "/Users/mrashkovskiy/Documents/py/flathunt/flathunter/flathunter/sender_telegram.py", line 124, in send_images self.handle_error( File "/Users/mrashkovskiy/Documents/py/flathunt/flathunter/flathunter/sender_telegram.py", line 159, in handle_error raise StampedeProtectionException( flathunter.exceptions.StampedeProtectionException: Too many messages too fast - backoff 12 seconds [2023/02/22 13:10:13|init.py |INFO ]: ensuring close — Reply to this email directly, view it on GitHub https://github.com/flathunters/flathunter/issues/321#issuecomment-1439921796, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAEK5QKDJI2ADCLZ7BZMUDWYX7LNANCNFSM6AAAAAAVDQXDUA . You are receiving this because you commented.Message ID: @.***>
If you had to update the code to remove the v2, you have probably not been running the latest code...
miklyx @.***> schrieb am Mi., 22. Feb. 2023, 13:07:
Hi!
1.
Successfully uninstalled undetected-chromedriver-2.1.1 Successfully installed undetected-chromedriver-3.4.6 2.
Also I've changed at chrome-wrapper.py
import undetected_chromedriver.v2 as uc to import
undetected_chromedriver as uc because after upgrade I've got No module named 'undetected_chromedriver.v2' 3.
Changed config.yaml - aligned spaces in captcha block 4.
Now it started, but with immoscout24 : [2023/02/22 12:57:45|chrome_wrapper.py |INFO ]: Initializing Chrome WebDriver for crawler... [2023/02/22 12:57:47|patcher.py |INFO ]: patching driver executable /Users/mrashkovskiy/Library/Application Support/undetected_chromedriver/undetected_chromedriver [2023/02/22 12:58:16|init.py |INFO ]: setting properties for headless [2023/02/22 12:58:17|crawl_immobilienscout.py|DEBUG ]: Got search URL https://www.immobilienscout24.de/Suche/de/berlin/berlin/wohnung-mieten?price=-1100.0&livingspace=50.0-&exclusioncriteria=projectlisting,swapflat&pricetype=rentpermonth&geocodes=110000000801,110000000307,110000000703,110000000101,110000000201,110000000102,110000000202,110000000301,110000000104,110000000105,110000000106,110000000701&sorting=2&enteredFrom=result_list&pagenumber={0} [2023/02/22 12:58:30|abstract_crawler.py |INFO ]: Timeout waiting for iframe element - no captcha verification necessary? [2023/02/22 12:58:30|crawl_immobilienscout.py|WARNING ]: Unable to find IS24 variable in window [2023/02/22 12:58:30|crawl_immobilienscout.py|ERROR ]: IS24 bot detection has identified our script as a bot - we've been blocked
— Reply to this email directly, view it on GitHub https://github.com/flathunters/flathunter/issues/321#issuecomment-1439913010, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAEK5UK7DDMSVH53SWBVIDWYX6PLANCNFSM6AAAAAAVDQXDUA . You are receiving this because you commented.Message ID: @.***>
I've restarted it twice after same termination and now it kinda stable. Thank you!
Error is:
File ".../flathunt/flathunter/flathunter/chrome_wrapper.py", line 44, in get_chrome_driver driver = uc.Chrome(version_main=chrome_version, options=chrome_options) # pylint: disable=no-member TypeError: __init__() got an unexpected keyword argument 'version_main'
undetected-chromedriver==3.2.1Without 2capcha it is always:
[2023/02/21 22:07:13|crawl_immobilienscout.py|ERROR ]: Index error occurred
in log