flathunters / flathunter

A bot to help people with their rental real-estate search. 🏠🤖
GNU Affero General Public License v3.0
842 stars 180 forks source link

CrawlImmobilienscout crashing #199

Closed timsegger closed 2 years ago

timsegger commented 2 years ago

On running the flathunter.service I get this error:

Aug 20 17:03:12 tsegger systemd[1]: Started Flathunter Python Script. Aug 20 17:03:16 tsegger flathunter[19722]: [2022/08/20 17:03:16|config.py |INFO ]: Using config /opt/flathunter/config.yaml Aug 20 17:03:16 tsegger flathunter[19722]: [2022/08/20 17:03:16|abstract_crawler.py |INFO ]: Initializing Chrome WebDriver for crawler "CrawlImmobilienscout"... Aug 20 17:03:17 tsegger flathunter[19722]: Traceback (most recent call last): Aug 20 17:03:17 tsegger flathunter[19722]: File "flathunt.py", line 105, in Aug 20 17:03:17 tsegger flathunter[19722]: main() Aug 20 17:03:17 tsegger flathunter[19722]: File "flathunt.py", line 76, in main Aug 20 17:03:17 tsegger flathunter[19722]: config.init_searchers() Aug 20 17:03:17 tsegger flathunter[19722]: File "/opt/flathunter/flathunter/config.py", line 44, in init_searchers Aug 20 17:03:17 tsegger flathunter[19722]: CrawlImmobilienscout(self), Aug 20 17:03:17 tsegger flathunter[19722]: File "/opt/flathunter/flathunter/crawl_immobilienscout.py", line 38, in init Aug 20 17:03:17 tsegger flathunter[19722]: self.driver = self.configure_driver(driver_arguments) Aug 20 17:03:17 tsegger flathunter[19722]: File "/opt/flathunter/flathunter/abstract_crawler.py", line 62, in configure_driver Aug 20 17:03:17 tsegger flathunter[19722]: options=chrome_options Aug 20 17:03:17 tsegger flathunter[19722]: File "/home/flathunter/.local/share/virtualenvs/flathunter--s35lxKo/lib/python3.7/site-packages/selenium/webdriver/chrome/webdriver.py", line 72, in init Aug 20 17:03:17 tsegger flathunter[19722]: service_log_path, service, keep_alive) Aug 20 17:03:17 tsegger flathunter[19722]: File "/home/flathunter/.local/share/virtualenvs/flathunter--s35lxKo/lib/python3.7/site-packages/selenium/webdriver/chromium/webdriver.py", line 97, in init Aug 20 17:03:17 tsegger flathunter[19722]: options=options) Aug 20 17:03:17 tsegger flathunter[19722]: File "/home/flathunter/.local/share/virtualenvs/flathunter--s35lxKo/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 277, in init Aug 20 17:03:17 tsegger flathunter[19722]: self.start_session(capabilities, browser_profile) Aug 20 17:03:17 tsegger flathunter[19722]: File "/home/flathunter/.local/share/virtualenvs/flathunter--s35lxKo/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 370, in start_session Aug 20 17:03:17 tsegger flathunter[19722]: response = self.execute(Command.NEW_SESSION, parameters) Aug 20 17:03:17 tsegger flathunter[19722]: File "/home/flathunter/.local/share/virtualenvs/flathunter--s35lxKo/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 435, in execute Aug 20 17:03:17 tsegger flathunter[19722]: self.error_handler.check_response(response) Aug 20 17:03:17 tsegger flathunter[19722]: File "/home/flathunter/.local/share/virtualenvs/flathunter--s35lxKo/lib/python3.7/site-packages/selenium/webdriver/remote/errorhandler.py", line 247, in check_response Aug 20 17:03:17 tsegger flathunter[19722]: raise exception_class(message, screen, stacktrace) Aug 20 17:03:17 tsegger flathunter[19722]: selenium.common.exceptions.WebDriverException: Message: unknown error: Chrome failed to start: exited abnormally. Aug 20 17:03:17 tsegger flathunter[19722]: (unknown error: DevToolsActivePort file doesn't exist) Aug 20 17:03:17 tsegger flathunter[19722]: (The process started from chrome location /usr/bin/google-chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.) Aug 20 17:03:17 tsegger flathunter[19722]: Stacktrace: Aug 20 17:03:17 tsegger flathunter[19722]: #0 0x55d6d37a92d3 Aug 20 17:03:17 tsegger flathunter[19722]: #1 0x55d6d35b33fa Aug 20 17:03:17 tsegger flathunter[19722]: #2 0x55d6d35d87da Aug 20 17:03:17 tsegger flathunter[19722]: #3 0x55d6d35d3ae4 Aug 20 17:03:17 tsegger flathunter[19722]: #4 0x55d6d360f1f3 Aug 20 17:03:17 tsegger flathunter[19722]: #5 0x55d6d3608fe3 Aug 20 17:03:17 tsegger flathunter[19722]: #6 0x55d6d35dee33 Aug 20 17:03:17 tsegger flathunter[19722]: #7 0x55d6d35e0015 Aug 20 17:03:17 tsegger flathunter[19722]: #8 0x55d6d37f53fd Aug 20 17:03:17 tsegger flathunter[19722]: #9 0x55d6d37f899c Aug 20 17:03:17 tsegger flathunter[19722]: #10 0x55d6d37dc39e Aug 20 17:03:17 tsegger flathunter[19722]: #11 0x55d6d37f95d3 Aug 20 17:03:17 tsegger flathunter[19722]: #12 0x55d6d37d028f Aug 20 17:03:17 tsegger flathunter[19722]: #13 0x55d6d3817728 Aug 20 17:03:17 tsegger flathunter[19722]: #14 0x55d6d38178d2 Aug 20 17:03:17 tsegger flathunter[19722]: #15 0x55d6d383199f Aug 20 17:03:17 tsegger flathunter[19722]: #16 0x7fb6fc844fa3 Aug 20 17:03:17 tsegger systemd[1]: flathunter.service: Main process exited, code=exited, status=1/FAILURE Aug 20 17:03:17 tsegger systemd[1]: flathunter.service: Failed with result 'exit-code'.

When I remove the CrawlImmobilienscout(self) from config.py everything works perfectly

alexanderroidl commented 2 years ago

Hi @TimS-Official, can you try adding the following arguments for --no-sandbox and/or --remote-debugging-port=9222 to your main configuration config.yaml at captcha/driver_arguments?

timsegger commented 2 years ago

When only adding --no-sandbox the error is the same When adding both the DevToolsActivePort file doesn't exist is replaced by chrome not reachable

timsegger commented 2 years ago

I tried refollowing the installation guide. And set verbose: true I then saw that CrawlImmobilienscout uses my google-chrome preinstalled driver instead of the new way. Thus I simply removed my preinstalled google-chrome Now the error is different:

Aug 21 17:42:05 tsegger systemd[1]: Started Flathunter Python Script. Aug 21 17:42:09 tsegger flathunter[15266]: [2022/08/21 17:42:09|config.py |INFO ]: Using config /opt/flathunter/config.yaml Aug 21 17:42:09 tsegger flathunter[15266]: [2022/08/21 17:42:09|flathunt.py |DEBUG ]: Settings from config: <flathunter.config.Config object at 0x7f0a9100b0b8> Aug 21 17:42:09 tsegger flathunter[15266]: [2022/08/21 17:42:09|abstract_crawler.py |INFO ]: Initializing Chrome WebDriver for crawler "CrawlImmobilienscout"... Aug 21 17:42:09 tsegger flathunter[15266]: [2022/08/21 17:42:09| |DEBUG ]: ====== WebDriver manager ====== Aug 21 17:42:09 tsegger flathunter[15266]: [2022/08/21 17:42:09| |DEBUG ]: Get LATEST chromedriver version for google-chrome None Aug 21 17:42:09 tsegger flathunter[15266]: [2022/08/21 17:42:09| |DEBUG ]: Driver [/home/flathunter/.wdm/drivers/chromedriver/linux64/104.0.5112/chromedriver] found in cache Aug 21 17:42:09 tsegger flathunter[15266]: Traceback (most recent call last): Aug 21 17:42:09 tsegger flathunter[15266]: File "flathunt.py", line 105, in Aug 21 17:42:09 tsegger flathunter[15266]: main() Aug 21 17:42:09 tsegger flathunter[15266]: File "flathunt.py", line 76, in main Aug 21 17:42:09 tsegger flathunter[15266]: config.init_searchers() Aug 21 17:42:09 tsegger flathunter[15266]: File "/opt/flathunter/flathunter/config.py", line 44, in init_searchers Aug 21 17:42:09 tsegger flathunter[15266]: CrawlImmobilienscout(self), Aug 21 17:42:09 tsegger flathunter[15266]: File "/opt/flathunter/flathunter/crawl_immobilienscout.py", line 38, in init Aug 21 17:42:09 tsegger flathunter[15266]: self.driver = self.configure_driver(driver_arguments) Aug 21 17:42:09 tsegger flathunter[15266]: File "/opt/flathunter/flathunter/abstract_crawler.py", line 62, in configure_driver Aug 21 17:42:09 tsegger flathunter[15266]: options=chrome_options Aug 21 17:42:09 tsegger flathunter[15266]: File "/home/flathunter/.local/share/virtualenvs/flathunter--s35lxKo/lib/python3.7/site-packages/selenium/webdriver/chrome/webdriver.py", line 72, in init Aug 21 17:42:09 tsegger flathunter[15266]: service_log_path, service, keep_alive) Aug 21 17:42:09 tsegger flathunter[15266]: File "/home/flathunter/.local/share/virtualenvs/flathunter--s35lxKo/lib/python3.7/site-packages/selenium/webdriver/chromium/webdriver.py", line 97, in init Aug 21 17:42:09 tsegger flathunter[15266]: options=options) Aug 21 17:42:09 tsegger flathunter[15266]: File "/home/flathunter/.local/share/virtualenvs/flathunter--s35lxKo/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 277, in init Aug 21 17:42:09 tsegger flathunter[15266]: self.start_session(capabilities, browser_profile) Aug 21 17:42:09 tsegger flathunter[15266]: File "/home/flathunter/.local/share/virtualenvs/flathunter--s35lxKo/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 370, in start_session Aug 21 17:42:09 tsegger flathunter[15266]: response = self.execute(Command.NEW_SESSION, parameters) Aug 21 17:42:09 tsegger flathunter[15266]: File "/home/flathunter/.local/share/virtualenvs/flathunter--s35lxKo/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 435, in execute Aug 21 17:42:09 tsegger flathunter[15266]: self.error_handler.check_response(response) Aug 21 17:42:09 tsegger flathunter[15266]: File "/home/flathunter/.local/share/virtualenvs/flathunter--s35lxKo/lib/python3.7/site-packages/selenium/webdriver/remote/errorhandler.py", line 247, in check_response Aug 21 17:42:09 tsegger flathunter[15266]: raise exception_class(message, screen, stacktrace) Aug 21 17:42:09 tsegger flathunter[15266]: selenium.common.exceptions.WebDriverException: Message: unknown error: cannot find Chrome binary Aug 21 17:42:09 tsegger flathunter[15266]: Stacktrace: Aug 21 17:42:09 tsegger flathunter[15266]: #0 0x55b24afd7403 Aug 21 17:42:09 tsegger flathunter[15266]: #1 0x55b24addd778 Aug 21 17:42:09 tsegger flathunter[15266]: #2 0x55b24adff916 Aug 21 17:42:09 tsegger flathunter[15266]: #3 0x55b24adfd12b Aug 21 17:42:09 tsegger flathunter[15266]: #4 0x55b24ae3883a Aug 21 17:42:09 tsegger flathunter[15266]: #5 0x55b24ae328f3 Aug 21 17:42:09 tsegger flathunter[15266]: #6 0x55b24ae080d8 Aug 21 17:42:09 tsegger flathunter[15266]: #7 0x55b24ae09205 Aug 21 17:42:09 tsegger flathunter[15266]: #8 0x55b24b01ee3d Aug 21 17:42:09 tsegger flathunter[15266]: #9 0x55b24b021db6 Aug 21 17:42:09 tsegger flathunter[15266]: #10 0x55b24b00813e Aug 21 17:42:09 tsegger flathunter[15266]: #11 0x55b24b0229b5 Aug 21 17:42:09 tsegger flathunter[15266]: #12 0x55b24affc970 Aug 21 17:42:09 tsegger flathunter[15266]: #13 0x55b24b03f228 Aug 21 17:42:09 tsegger flathunter[15266]: #14 0x55b24b03f3bf Aug 21 17:42:09 tsegger flathunter[15266]: #15 0x55b24b059abe Aug 21 17:42:09 tsegger flathunter[15266]: #16 0x7f9c77812fa3 Aug 21 17:42:09 tsegger systemd[1]: flathunter.service: Main process exited, code=exited, status=1/FAILURE Aug 21 17:42:09 tsegger systemd[1]: flathunter.service: Failed with result 'exit-code'.

This error is apparently independent from driver_arguments. But again, when I remove CrawlImmobilienscout(self) from config.py everything works.

codders commented 2 years ago

Do you have an exact version match between your webdriver version and your chrome version? I see the error cannot find Chrome binary - where is chrome located on the target system?

timsegger commented 2 years ago

I followed the first part of this tutorial: https://yizeng.me/2014/04/20/install-chromedriver-and-phantomjs-on-linux-mint/

Therefore I installed the newest version which was something with 105.x Which version would be the one of the webdriver?

squilaSC commented 2 years ago

Running in docker I needed to add these two arguments for it to run.

- "--headless" - "--no-sandbox"

abuchmueller commented 2 years ago

Running in docker I needed to add these two arguments for it to run.

- "--headless" - "--no-sandbox"

@squilaSC thanks, this makes the docker image run but I experienced a crash. After rebooting it seems to work now, though. Maybe resolving the captcha failed?

Edit: I've run the container over night now, I crashed again after a couple of hours.

22/08/23 22:12:55|config.py               |INFO    ]: Using config /config.yaml
[2022/08/23 22:12:55|flathunt.py             |DEBUG   ]: Settings from config: <flathunter.config.Config object at 0x7fada60c5bd0>
[2022/08/23 22:12:55|abstract_crawler.py     |INFO    ]: Initializing Chrome WebDriver for crawler "CrawlImmobilienscout"...

[2022/08/23 22:12:55|<WebDriverManager>      |DEBUG   ]: ====== WebDriver manager ======
[2022/08/23 22:12:55|<WebDriverManager>      |DEBUG   ]: Get LATEST chromedriver version for google-chrome 104.0.5112
[2022/08/23 22:12:55|<WebDriverManager>      |DEBUG   ]: There is no [linux64] chromedriver for browser 104.0.5112 in cache
[2022/08/23 22:12:55|<WebDriverManager>      |DEBUG   ]: About to download new driver from https://chromedriver.storage.googleapis.com/104.0.5112.79/chromedriver_linux64.zip
[2022/08/23 22:12:56|<WebDriverManager>      |DEBUG   ]: Driver has been saved in cache [/root/.wdm/drivers/chromedriver/linux64/104.0.5112]
[2022/08/23 22:12:57|crawl_immobilienscout.py|DEBUG   ]: Got search URL https://www.immobilienscout24.de/Suche/de/XXXX
[2022/08/23 22:13:00|twocaptcha_solver.py    |INFO    ]: Trying to solve geetest.
[2022/08/23 22:13:00|twocaptcha_solver.py    |DEBUG   ]: Got response from 2captcha/in: OK|71318916888
[2022/08/23 22:13:00|twocaptcha_solver.py    |DEBUG   ]: Got response from 2captcha/res: CAPCHA_NOT_READY
[2022/08/23 22:13:00|twocaptcha_solver.py    |INFO    ]: Captcha is not ready yet, waiting...
[2022/08/23 22:13:05|twocaptcha_solver.py    |DEBUG   ]: Got response from 2captcha/res: CAPCHA_NOT_READY
[2022/08/23 22:13:05|twocaptcha_solver.py    |INFO    ]: Captcha is not ready yet, waiting...
[2022/08/23 22:13:10|twocaptcha_solver.py    |DEBUG   ]: Got response from 2captcha/res: CAPCHA_NOT_READY
[2022/08/23 22:13:10|twocaptcha_solver.py    |INFO    ]: Captcha is not ready yet, waiting...
[2022/08/23 22:13:15|twocaptcha_solver.py    |DEBUG   ]: Got response from 2captcha/res: CAPCHA_NOT_READY
[2022/08/23 22:13:15|twocaptcha_solver.py    |INFO    ]: Captcha is not ready yet, waiting...
[2022/08/23 22:13:20|twocaptcha_solver.py    |DEBUG   ]: Got response from 2captcha/res: CAPCHA_NOT_READY
[2022/08/23 22:13:20|twocaptcha_solver.py    |INFO    ]: Captcha is not ready yet, waiting...
[2022/08/23 22:13:25|twocaptcha_solver.py    |DEBUG   ]: Got response from 2captcha/res: CAPCHA_NOT_READY
[2022/08/23 22:13:25|twocaptcha_solver.py    |INFO    ]: Captcha is not ready yet, waiting...
[2022/08/23 22:13:30|twocaptcha_solver.py    |DEBUG   ]: Got response from 2captcha/res: CAPCHA_NOT_READY
[2022/08/23 22:13:30|twocaptcha_solver.py    |INFO    ]: Captcha is not ready yet, waiting...
[2022/08/23 22:13:35|twocaptcha_solver.py    |DEBUG   ]: Got response from 2captcha/res: CAPCHA_NOT_READY
[2022/08/23 22:13:35|twocaptcha_solver.py    |INFO    ]: Captcha is not ready yet, waiting...
[2022/08/23 22:13:41|twocaptcha_solver.py    |DEBUG   ]: Got response from 2captcha/res: CAPCHA_NOT_READY
[2022/08/23 22:13:41|twocaptcha_solver.py    |INFO    ]: Captcha is not ready yet, waiting...
[2022/08/23 22:13:46|twocaptcha_solver.py    |DEBUG   ]: Got response from 2captcha/res: OK|{"geetest_challenge":"013d7f12095c46075d2bed1d1f2e1736","geetest_validate":"xxx","geetest_seccode":"xxx|jordan"}
Traceback (most recent call last):
  File "flathunt.py", line 105, in <module>
    main()
  File "flathunt.py", line 101, in main
    launch_flat_hunt(config, heartbeat)
  File "flathunt.py", line 31, in launch_flat_hunt
    hunter.hunt_flats()
  File "/usr/src/app/flathunter/hunter.py", line 54, in hunt_flats
    for expose in processor_chain.process(self.crawl_for_exposes(max_pages)):
  File "/usr/src/app/flathunter/hunter.py", line 34, in crawl_for_exposes
    for searcher in self.config.searchers()
  File "/usr/src/app/flathunter/hunter.py", line 35, in <listcomp>
    for url in self.config.get('urls', [])])
  File "/usr/src/app/flathunter/hunter.py", line 25, in try_crawl
    return searcher.crawl(url, max_pages)
  File "/usr/src/app/flathunter/abstract_crawler.py", line 158, in crawl
    return self.get_results(url, max_pages)
  File "/usr/src/app/flathunter/crawl_immobilienscout.py", line 55, in get_results
    soup = self.get_page(search_url, self.driver, page_no)
  File "/usr/src/app/flathunter/crawl_immobilienscout.py", line 128, in get_page
    afterlogin_string=self.afterlogin_string
  File "/usr/src/app/flathunter/abstract_crawler.py", line 93, in get_soup_from_url
    return BeautifulSoup(driver.page_source, 'html.parser')
  File "/usr/local/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 541, in page_source
    return self.execute(Command.GET_PAGE_SOURCE)['value']
  File "/usr/local/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 435, in execute
    self.error_handler.check_response(response)
  File "/usr/local/lib/python3.7/site-packages/selenium/webdriver/remote/errorhandler.py", line 247, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unknown error: session deleted because of page crash
from unknown error: cannot determine loading status
from tab crashed
  (Session info: headless chrome=104.0.5112.101)
Stacktrace:
#0 0x55f5fd451403 <unknown>
#1 0x55f5fd25764b <unknown>
#2 0x55f5fd24475a <unknown>
#3 0x55f5fd24365b <unknown>
#4 0x55f5fd243c1c <unknown>
#5 0x55f5fd24fc3f <unknown>
#6 0x55f5fd2507a2 <unknown>
#7 0x55f5fd25edad <unknown>
#8 0x55f5fd262c6a <unknown>
#9 0x55f5fd244046 <unknown>
#10 0x55f5fd25e951 <unknown>
#11 0x55f5fd2bfb53 <unknown>
#12 0x55f5fd2ac8f3 <unknown>
#13 0x55f5fd2820d8 <unknown>
#14 0x55f5fd283205 <unknown>
#15 0x55f5fd498e3d <unknown>
#16 0x55f5fd49bdb6 <unknown>
#17 0x55f5fd48213e <unknown>
#18 0x55f5fd49c9b5 <unknown>
#19 0x55f5fd476970 <unknown>
#20 0x55f5fd4b9228 <unknown>
#21 0x55f5fd4b93bf <unknown>
#22 0x55f5fd4d3abe <unknown>
#23 0x7f19cccf6ea7 <unknown>

Log 2

[2022/08/24 07:40:02|hunter.py               |INFO    ]: New offer: beliebte 2 Zimmer Wohnung
[2022/08/24 07:40:02|idmaintainer.py         |DEBUG   ]: is_processed(7917747028695013)
[2022/08/24 07:40:02|idmaintainer.py         |DEBUG   ]: is_processed(7044444671130838)
[2022/08/24 07:40:02|idmaintainer.py         |DEBUG   ]: is_processed(8824293547546021)
[2022/08/24 07:50:02|crawl_immobilienscout.py|DEBUG   ]: Got search URL https://www.immobilienscout24.de/Suche/de/hamburg/hamburg/wohnung-mieten?numberofrooms=1.5-&price=-800.0&livingspace=30.0-&pricetype=rentpermonth&geocodes=0200000006057,0200000006058,0200000006059,0200000007070,0200000006084,0200000006073,0200000007076,0200000006075,0200000005054,0200000006055&sorting=2&pagenumber={0}
Traceback (most recent call last):
  File "flathunt.py", line 105, in <module>
    main()
  File "flathunt.py", line 101, in main
    launch_flat_hunt(config, heartbeat)
  File "flathunt.py", line 38, in launch_flat_hunt
    hunter.hunt_flats()
  File "/usr/src/app/flathunter/hunter.py", line 54, in hunt_flats
    for expose in processor_chain.process(self.crawl_for_exposes(max_pages)):
  File "/usr/src/app/flathunter/hunter.py", line 34, in crawl_for_exposes
    for searcher in self.config.searchers()
  File "/usr/src/app/flathunter/hunter.py", line 35, in <listcomp>
    for url in self.config.get('urls', [])])
  File "/usr/src/app/flathunter/hunter.py", line 25, in try_crawl
    return searcher.crawl(url, max_pages)
  File "/usr/src/app/flathunter/abstract_crawler.py", line 158, in crawl
    return self.get_results(url, max_pages)
  File "/usr/src/app/flathunter/crawl_immobilienscout.py", line 55, in get_results
    soup = self.get_page(search_url, self.driver, page_no)
  File "/usr/src/app/flathunter/crawl_immobilienscout.py", line 128, in get_page
    afterlogin_string=self.afterlogin_string
  File "/usr/src/app/flathunter/abstract_crawler.py", line 88, in get_soup_from_url
    driver.get(url)
  File "/usr/local/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 447, in get
    self.execute(Command.GET, {'url': url})
  File "/usr/local/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 435, in execute
    self.error_handler.check_response(response)
  File "/usr/local/lib/python3.7/site-packages/selenium/webdriver/remote/errorhandler.py", line 247, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unknown error: session deleted because of page crash
from tab crashed
  (Session info: headless chrome=104.0.5112.101)
Stacktrace:
#0 0x560b2c5ff403 <unknown>
#1 0x560b2c40564b <unknown>
#2 0x560b2c3f3b2d <unknown>
#3 0x560b2c3f3545 <unknown>
#4 0x560b2c3f2995 <unknown>
#5 0x560b2c3f20d4 <unknown>
#6 0x560b2c40c951 <unknown>
#7 0x560b2c46e078 <unknown>
#8 0x560b2c45a8f3 <unknown>
#9 0x560b2c4300d8 <unknown>
#10 0x560b2c431205 <unknown>
#11 0x560b2c646e3d <unknown>
#12 0x560b2c649db6 <unknown>
#13 0x560b2c63013e <unknown>
#14 0x560b2c64a9b5 <unknown>
#15 0x560b2c624970 <unknown>
#16 0x560b2c667228 <unknown>
#17 0x560b2c6673bf <unknown>
#18 0x560b2c681abe <unknown>
#19 0x7f2b269b1ea7 <unknown>
timsegger commented 2 years ago

Can confirm @abuchmueller 's experience. Switched to docker using @squilaSC 's driver arguments and get the same Errors very irregularly (sometimes after 30min sometimes after 3 hours)

Thanks to docker's "restart unless stopped"-policy it just restarts the process and I can actually use it.

After rebooting it seems to work now, though. Maybe resolving the captcha failed?

Looking into the logs (docker logs -t <name>) I can say that those crashes do not happen due to the captcha itself, but multiple minutes or hours later. So I guess they happen before the next captcha needs to be solved or something similar?

Edit:

2022-08-24T09:29:15.712743300Z [2022/08/24 09:29:15|twocaptcha_solver.py    |INFO    ]: Trying to solve geetest.
2022-08-24T09:29:15.879073118Z [2022/08/24 09:29:15|twocaptcha_solver.py    |INFO    ]: Captcha is not ready yet, waiting...
2022-08-24T09:29:20.963026882Z [2022/08/24 09:29:20|twocaptcha_solver.py    |INFO    ]: Captcha is not ready yet, waiting...
2022-08-24T09:29:26.045437341Z [2022/08/24 09:29:26|twocaptcha_solver.py    |INFO    ]: Captcha is not ready yet, waiting...
2022-08-24T09:29:31.120585833Z [2022/08/24 09:29:31|twocaptcha_solver.py    |INFO    ]: Captcha is not ready yet, waiting...
2022-08-24T09:29:36.218154553Z [2022/08/24 09:29:36|twocaptcha_solver.py    |INFO    ]: Captcha is not ready yet, waiting...
2022-08-24T09:29:41.302687462Z [2022/08/24 09:29:41|twocaptcha_solver.py    |INFO    ]: Captcha is not ready yet, waiting...
2022-08-24T09:29:46.389405839Z [2022/08/24 09:29:46|twocaptcha_solver.py    |INFO    ]: Captcha is not ready yet, waiting...
2022-08-24T09:29:51.462905713Z [2022/08/24 09:29:51|twocaptcha_solver.py    |INFO    ]: Captcha is not ready yet, waiting...
2022-08-24T09:29:56.536812746Z [2022/08/24 09:29:56|twocaptcha_solver.py    |INFO    ]: Captcha is not ready yet, waiting...
2022-08-24T09:30:01.609837660Z [2022/08/24 09:30:01|twocaptcha_solver.py    |INFO    ]: Captcha is not ready yet, waiting...
2022-08-24T09:50:20.670315785Z [2022/08/24 09:50:20|hunter.py               |INFO    ]: New offer: redacted
2022-08-24T09:50:20.929963807Z [2022/08/24 09:50:20|hunter.py               |INFO    ]: New offer: redacted
2022-08-24T09:50:21.163025163Z [2022/08/24 09:50:21|hunter.py               |INFO    ]: New offer: redacted
2022-08-24T10:00:23.587249546Z Traceback (most recent call last):
2022-08-24T10:00:23.587383206Z   File "flathunt.py", line 105, in <module>
2022-08-24T10:00:23.587791048Z     main()
2022-08-24T10:00:23.587881667Z   File "flathunt.py", line 101, in main
2022-08-24T10:00:23.588307985Z     launch_flat_hunt(config, heartbeat)
2022-08-24T10:00:23.588382834Z   File "flathunt.py", line 38, in launch_flat_hunt
2022-08-24T10:00:23.588717841Z     hunter.hunt_flats()
2022-08-24T10:00:23.588813640Z   File "/usr/src/app/flathunter/hunter.py", line 54, in hunt_flats
2022-08-24T10:00:23.589107038Z     for expose in processor_chain.process(self.crawl_for_exposes(max_pages)):
2022-08-24T10:00:23.589193119Z   File "/usr/src/app/flathunter/hunter.py", line 34, in crawl_for_exposes
2022-08-24T10:00:23.589481068Z     for searcher in self.config.searchers()
2022-08-24T10:00:23.589555297Z   File "/usr/src/app/flathunter/hunter.py", line 35, in <listcomp>
2022-08-24T10:00:23.589852712Z     for url in self.config.get('urls', [])])
2022-08-24T10:00:23.589924226Z   File "/usr/src/app/flathunter/hunter.py", line 25, in try_crawl
2022-08-24T10:00:23.590209891Z     return searcher.crawl(url, max_pages)
2022-08-24T10:00:23.590293677Z   File "/usr/src/app/flathunter/abstract_crawler.py", line 158, in crawl
2022-08-24T10:00:23.590632441Z     return self.get_results(url, max_pages)
2022-08-24T10:00:23.590717780Z   File "/usr/src/app/flathunter/crawl_immobilienscout.py", line 59, in get_results
2022-08-24T10:00:23.591014675Z     return self.get_entries_from_javascript()
2022-08-24T10:00:23.591089234Z   File "/usr/src/app/flathunter/crawl_immobilienscout.py", line 90, in get_entries_from_javascript
2022-08-24T10:00:23.591414473Z     result_json = self.driver.execute_script('return window.IS24.resultList;')
2022-08-24T10:00:23.591544866Z   File "/usr/local/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 495, in execute_script
2022-08-24T10:00:23.591856990Z     'args': converted_args})['value']
2022-08-24T10:00:23.591868120Z   File "/usr/local/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 435, in execute
2022-08-24T10:00:23.592033049Z     self.error_handler.check_response(response)
2022-08-24T10:00:23.592067974Z   File "/usr/local/lib/python3.7/site-packages/selenium/webdriver/remote/errorhandler.py", line 247, in check_response
2022-08-24T10:00:23.592252650Z     raise exception_class(message, screen, stacktrace)
2022-08-24T10:00:23.592322380Z selenium.common.exceptions.WebDriverException: Message: unknown error: session deleted because of page crash
2022-08-24T10:00:23.592326828Z from unknown error: cannot determine loading status
2022-08-24T10:00:23.592329052Z from tab crashed
2022-08-24T10:00:23.592331166Z   (Session info: headless chrome=104.0.5112.101)
2022-08-24T10:00:23.592333320Z Stacktrace:
2022-08-24T10:00:23.592335504Z #0 0x561c6c683403 <unknown>
2022-08-24T10:00:23.592337709Z #1 0x561c6c48964b <unknown>
2022-08-24T10:00:23.592339812Z #2 0x561c6c47675a <unknown>
2022-08-24T10:00:23.592341846Z #3 0x561c6c47565b <unknown>
2022-08-24T10:00:23.592349471Z #4 0x561c6c475c1c <unknown>
2022-08-24T10:00:23.592351635Z #5 0x561c6c481c3f <unknown>
2022-08-24T10:00:23.592353668Z #6 0x561c6c4827a2 <unknown>
2022-08-24T10:00:23.592355712Z #7 0x561c6c491076 <unknown>
2022-08-24T10:00:23.592357806Z #8 0x561c6c4f1ee1 <unknown>
2022-08-24T10:00:23.592359830Z #9 0x561c6c4de8f3 <unknown>
2022-08-24T10:00:23.592361894Z #10 0x561c6c4b40d8 <unknown>
2022-08-24T10:00:23.592363938Z #11 0x561c6c4b5205 <unknown>
2022-08-24T10:00:23.592365972Z #12 0x561c6c6cae3d <unknown>
2022-08-24T10:00:23.592367955Z #13 0x561c6c6cddb6 <unknown>
2022-08-24T10:00:23.592369909Z #14 0x561c6c6b413e <unknown>
2022-08-24T10:00:23.592371903Z #15 0x561c6c6ce9b5 <unknown>
2022-08-24T10:00:23.592374007Z #16 0x561c6c6a8970 <unknown>
2022-08-24T10:00:23.592376662Z #17 0x561c6c6eb228 <unknown>
2022-08-24T10:00:23.592379567Z #18 0x561c6c6eb3bf <unknown>
2022-08-24T10:00:23.592381962Z #19 0x561c6c705abe <unknown>
2022-08-24T10:00:23.592383955Z #20 0x7f49ec9ccea7 <unknown>

After taking a second look at the timestamps I saw that they are almost exactly 10min after the last successful search. Thus the crash seems to occur during crawling and for this special occurance there even was no captcha required. I did not get those crashes when I removed the Immoscout24 crawler. So I assume it is because of that.

Edit2:

According to stackoverflow (https://stackoverflow.com/questions/53902507/unknown-error-session-deleted-because-of-page-crash-from-unknown-error-cannot) the extra parameter --disable-dev-shm-usage or increasing the dockers shm-size should solve this issue. Will try this later today

Under https://github.com/docker/cli/issues/1278 you find a way to persistently increase dockers ShmSize. I will try this approach today.

Edit3:

Runs for 24+ hours without crashing now. Seems to work :)

alexanderroidl commented 2 years ago

Edit3: Runs for 24+ hours without crashing now. Seems to work :)

 🎉