Closed timsegger closed 2 years ago
Hi @TimS-Official, can you try adding the following arguments for --no-sandbox
and/or --remote-debugging-port=9222
to your main configuration config.yaml
at captcha/driver_arguments
?
When only adding --no-sandbox
the error is the same
When adding both the DevToolsActivePort file doesn't exist
is replaced by chrome not reachable
I tried refollowing the installation guide. And set verbose: true
I then saw that CrawlImmobilienscout uses my google-chrome preinstalled driver instead of the new way.
Thus I simply removed my preinstalled google-chrome
Now the error is different:
Aug 21 17:42:05 tsegger systemd[1]: Started Flathunter Python Script. Aug 21 17:42:09 tsegger flathunter[15266]: [2022/08/21 17:42:09|config.py |INFO ]: Using config /opt/flathunter/config.yaml Aug 21 17:42:09 tsegger flathunter[15266]: [2022/08/21 17:42:09|flathunt.py |DEBUG ]: Settings from config: <flathunter.config.Config object at 0x7f0a9100b0b8> Aug 21 17:42:09 tsegger flathunter[15266]: [2022/08/21 17:42:09|abstract_crawler.py |INFO ]: Initializing Chrome WebDriver for crawler "CrawlImmobilienscout"... Aug 21 17:42:09 tsegger flathunter[15266]: [2022/08/21 17:42:09|
|DEBUG ]: ====== WebDriver manager ====== Aug 21 17:42:09 tsegger flathunter[15266]: [2022/08/21 17:42:09| |DEBUG ]: Get LATEST chromedriver version for google-chrome None Aug 21 17:42:09 tsegger flathunter[15266]: [2022/08/21 17:42:09| |DEBUG ]: Driver [/home/flathunter/.wdm/drivers/chromedriver/linux64/104.0.5112/chromedriver] found in cache Aug 21 17:42:09 tsegger flathunter[15266]: Traceback (most recent call last): Aug 21 17:42:09 tsegger flathunter[15266]: File "flathunt.py", line 105, in Aug 21 17:42:09 tsegger flathunter[15266]: main() Aug 21 17:42:09 tsegger flathunter[15266]: File "flathunt.py", line 76, in main Aug 21 17:42:09 tsegger flathunter[15266]: config.init_searchers() Aug 21 17:42:09 tsegger flathunter[15266]: File "/opt/flathunter/flathunter/config.py", line 44, in init_searchers Aug 21 17:42:09 tsegger flathunter[15266]: CrawlImmobilienscout(self), Aug 21 17:42:09 tsegger flathunter[15266]: File "/opt/flathunter/flathunter/crawl_immobilienscout.py", line 38, in init Aug 21 17:42:09 tsegger flathunter[15266]: self.driver = self.configure_driver(driver_arguments) Aug 21 17:42:09 tsegger flathunter[15266]: File "/opt/flathunter/flathunter/abstract_crawler.py", line 62, in configure_driver Aug 21 17:42:09 tsegger flathunter[15266]: options=chrome_options Aug 21 17:42:09 tsegger flathunter[15266]: File "/home/flathunter/.local/share/virtualenvs/flathunter--s35lxKo/lib/python3.7/site-packages/selenium/webdriver/chrome/webdriver.py", line 72, in init Aug 21 17:42:09 tsegger flathunter[15266]: service_log_path, service, keep_alive) Aug 21 17:42:09 tsegger flathunter[15266]: File "/home/flathunter/.local/share/virtualenvs/flathunter--s35lxKo/lib/python3.7/site-packages/selenium/webdriver/chromium/webdriver.py", line 97, in init Aug 21 17:42:09 tsegger flathunter[15266]: options=options) Aug 21 17:42:09 tsegger flathunter[15266]: File "/home/flathunter/.local/share/virtualenvs/flathunter--s35lxKo/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 277, in init Aug 21 17:42:09 tsegger flathunter[15266]: self.start_session(capabilities, browser_profile) Aug 21 17:42:09 tsegger flathunter[15266]: File "/home/flathunter/.local/share/virtualenvs/flathunter--s35lxKo/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 370, in start_session Aug 21 17:42:09 tsegger flathunter[15266]: response = self.execute(Command.NEW_SESSION, parameters) Aug 21 17:42:09 tsegger flathunter[15266]: File "/home/flathunter/.local/share/virtualenvs/flathunter--s35lxKo/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 435, in execute Aug 21 17:42:09 tsegger flathunter[15266]: self.error_handler.check_response(response) Aug 21 17:42:09 tsegger flathunter[15266]: File "/home/flathunter/.local/share/virtualenvs/flathunter--s35lxKo/lib/python3.7/site-packages/selenium/webdriver/remote/errorhandler.py", line 247, in check_response Aug 21 17:42:09 tsegger flathunter[15266]: raise exception_class(message, screen, stacktrace) Aug 21 17:42:09 tsegger flathunter[15266]: selenium.common.exceptions.WebDriverException: Message: unknown error: cannot find Chrome binary Aug 21 17:42:09 tsegger flathunter[15266]: Stacktrace: Aug 21 17:42:09 tsegger flathunter[15266]: #0 0x55b24afd7403 Aug 21 17:42:09 tsegger flathunter[15266]: #1 0x55b24addd778 Aug 21 17:42:09 tsegger flathunter[15266]: #2 0x55b24adff916 Aug 21 17:42:09 tsegger flathunter[15266]: #3 0x55b24adfd12b Aug 21 17:42:09 tsegger flathunter[15266]: #4 0x55b24ae3883a Aug 21 17:42:09 tsegger flathunter[15266]: #5 0x55b24ae328f3 Aug 21 17:42:09 tsegger flathunter[15266]: #6 0x55b24ae080d8 Aug 21 17:42:09 tsegger flathunter[15266]: #7 0x55b24ae09205 Aug 21 17:42:09 tsegger flathunter[15266]: #8 0x55b24b01ee3d Aug 21 17:42:09 tsegger flathunter[15266]: #9 0x55b24b021db6 Aug 21 17:42:09 tsegger flathunter[15266]: #10 0x55b24b00813e Aug 21 17:42:09 tsegger flathunter[15266]: #11 0x55b24b0229b5 Aug 21 17:42:09 tsegger flathunter[15266]: #12 0x55b24affc970 Aug 21 17:42:09 tsegger flathunter[15266]: #13 0x55b24b03f228 Aug 21 17:42:09 tsegger flathunter[15266]: #14 0x55b24b03f3bf Aug 21 17:42:09 tsegger flathunter[15266]: #15 0x55b24b059abe Aug 21 17:42:09 tsegger flathunter[15266]: #16 0x7f9c77812fa3 Aug 21 17:42:09 tsegger systemd[1]: flathunter.service: Main process exited, code=exited, status=1/FAILURE Aug 21 17:42:09 tsegger systemd[1]: flathunter.service: Failed with result 'exit-code'.
This error is apparently independent from driver_arguments.
But again, when I remove CrawlImmobilienscout(self)
from config.py everything works.
Do you have an exact version match between your webdriver version and your chrome version? I see the error cannot find Chrome binary
- where is chrome located on the target system?
I followed the first part of this tutorial: https://yizeng.me/2014/04/20/install-chromedriver-and-phantomjs-on-linux-mint/
Therefore I installed the newest version which was something with 105.x Which version would be the one of the webdriver?
Running in docker I needed to add these two arguments for it to run.
- "--headless"
- "--no-sandbox"
Running in docker I needed to add these two arguments for it to run.
- "--headless"
- "--no-sandbox"
@squilaSC thanks, this makes the docker image run but I experienced a crash. After rebooting it seems to work now, though. Maybe resolving the captcha failed?
Edit: I've run the container over night now, I crashed again after a couple of hours.
22/08/23 22:12:55|config.py |INFO ]: Using config /config.yaml
[2022/08/23 22:12:55|flathunt.py |DEBUG ]: Settings from config: <flathunter.config.Config object at 0x7fada60c5bd0>
[2022/08/23 22:12:55|abstract_crawler.py |INFO ]: Initializing Chrome WebDriver for crawler "CrawlImmobilienscout"...
[2022/08/23 22:12:55|<WebDriverManager> |DEBUG ]: ====== WebDriver manager ======
[2022/08/23 22:12:55|<WebDriverManager> |DEBUG ]: Get LATEST chromedriver version for google-chrome 104.0.5112
[2022/08/23 22:12:55|<WebDriverManager> |DEBUG ]: There is no [linux64] chromedriver for browser 104.0.5112 in cache
[2022/08/23 22:12:55|<WebDriverManager> |DEBUG ]: About to download new driver from https://chromedriver.storage.googleapis.com/104.0.5112.79/chromedriver_linux64.zip
[2022/08/23 22:12:56|<WebDriverManager> |DEBUG ]: Driver has been saved in cache [/root/.wdm/drivers/chromedriver/linux64/104.0.5112]
[2022/08/23 22:12:57|crawl_immobilienscout.py|DEBUG ]: Got search URL https://www.immobilienscout24.de/Suche/de/XXXX
[2022/08/23 22:13:00|twocaptcha_solver.py |INFO ]: Trying to solve geetest.
[2022/08/23 22:13:00|twocaptcha_solver.py |DEBUG ]: Got response from 2captcha/in: OK|71318916888
[2022/08/23 22:13:00|twocaptcha_solver.py |DEBUG ]: Got response from 2captcha/res: CAPCHA_NOT_READY
[2022/08/23 22:13:00|twocaptcha_solver.py |INFO ]: Captcha is not ready yet, waiting...
[2022/08/23 22:13:05|twocaptcha_solver.py |DEBUG ]: Got response from 2captcha/res: CAPCHA_NOT_READY
[2022/08/23 22:13:05|twocaptcha_solver.py |INFO ]: Captcha is not ready yet, waiting...
[2022/08/23 22:13:10|twocaptcha_solver.py |DEBUG ]: Got response from 2captcha/res: CAPCHA_NOT_READY
[2022/08/23 22:13:10|twocaptcha_solver.py |INFO ]: Captcha is not ready yet, waiting...
[2022/08/23 22:13:15|twocaptcha_solver.py |DEBUG ]: Got response from 2captcha/res: CAPCHA_NOT_READY
[2022/08/23 22:13:15|twocaptcha_solver.py |INFO ]: Captcha is not ready yet, waiting...
[2022/08/23 22:13:20|twocaptcha_solver.py |DEBUG ]: Got response from 2captcha/res: CAPCHA_NOT_READY
[2022/08/23 22:13:20|twocaptcha_solver.py |INFO ]: Captcha is not ready yet, waiting...
[2022/08/23 22:13:25|twocaptcha_solver.py |DEBUG ]: Got response from 2captcha/res: CAPCHA_NOT_READY
[2022/08/23 22:13:25|twocaptcha_solver.py |INFO ]: Captcha is not ready yet, waiting...
[2022/08/23 22:13:30|twocaptcha_solver.py |DEBUG ]: Got response from 2captcha/res: CAPCHA_NOT_READY
[2022/08/23 22:13:30|twocaptcha_solver.py |INFO ]: Captcha is not ready yet, waiting...
[2022/08/23 22:13:35|twocaptcha_solver.py |DEBUG ]: Got response from 2captcha/res: CAPCHA_NOT_READY
[2022/08/23 22:13:35|twocaptcha_solver.py |INFO ]: Captcha is not ready yet, waiting...
[2022/08/23 22:13:41|twocaptcha_solver.py |DEBUG ]: Got response from 2captcha/res: CAPCHA_NOT_READY
[2022/08/23 22:13:41|twocaptcha_solver.py |INFO ]: Captcha is not ready yet, waiting...
[2022/08/23 22:13:46|twocaptcha_solver.py |DEBUG ]: Got response from 2captcha/res: OK|{"geetest_challenge":"013d7f12095c46075d2bed1d1f2e1736","geetest_validate":"xxx","geetest_seccode":"xxx|jordan"}
Traceback (most recent call last):
File "flathunt.py", line 105, in <module>
main()
File "flathunt.py", line 101, in main
launch_flat_hunt(config, heartbeat)
File "flathunt.py", line 31, in launch_flat_hunt
hunter.hunt_flats()
File "/usr/src/app/flathunter/hunter.py", line 54, in hunt_flats
for expose in processor_chain.process(self.crawl_for_exposes(max_pages)):
File "/usr/src/app/flathunter/hunter.py", line 34, in crawl_for_exposes
for searcher in self.config.searchers()
File "/usr/src/app/flathunter/hunter.py", line 35, in <listcomp>
for url in self.config.get('urls', [])])
File "/usr/src/app/flathunter/hunter.py", line 25, in try_crawl
return searcher.crawl(url, max_pages)
File "/usr/src/app/flathunter/abstract_crawler.py", line 158, in crawl
return self.get_results(url, max_pages)
File "/usr/src/app/flathunter/crawl_immobilienscout.py", line 55, in get_results
soup = self.get_page(search_url, self.driver, page_no)
File "/usr/src/app/flathunter/crawl_immobilienscout.py", line 128, in get_page
afterlogin_string=self.afterlogin_string
File "/usr/src/app/flathunter/abstract_crawler.py", line 93, in get_soup_from_url
return BeautifulSoup(driver.page_source, 'html.parser')
File "/usr/local/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 541, in page_source
return self.execute(Command.GET_PAGE_SOURCE)['value']
File "/usr/local/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 435, in execute
self.error_handler.check_response(response)
File "/usr/local/lib/python3.7/site-packages/selenium/webdriver/remote/errorhandler.py", line 247, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unknown error: session deleted because of page crash
from unknown error: cannot determine loading status
from tab crashed
(Session info: headless chrome=104.0.5112.101)
Stacktrace:
#0 0x55f5fd451403 <unknown>
#1 0x55f5fd25764b <unknown>
#2 0x55f5fd24475a <unknown>
#3 0x55f5fd24365b <unknown>
#4 0x55f5fd243c1c <unknown>
#5 0x55f5fd24fc3f <unknown>
#6 0x55f5fd2507a2 <unknown>
#7 0x55f5fd25edad <unknown>
#8 0x55f5fd262c6a <unknown>
#9 0x55f5fd244046 <unknown>
#10 0x55f5fd25e951 <unknown>
#11 0x55f5fd2bfb53 <unknown>
#12 0x55f5fd2ac8f3 <unknown>
#13 0x55f5fd2820d8 <unknown>
#14 0x55f5fd283205 <unknown>
#15 0x55f5fd498e3d <unknown>
#16 0x55f5fd49bdb6 <unknown>
#17 0x55f5fd48213e <unknown>
#18 0x55f5fd49c9b5 <unknown>
#19 0x55f5fd476970 <unknown>
#20 0x55f5fd4b9228 <unknown>
#21 0x55f5fd4b93bf <unknown>
#22 0x55f5fd4d3abe <unknown>
#23 0x7f19cccf6ea7 <unknown>
Log 2
[2022/08/24 07:40:02|hunter.py |INFO ]: New offer: beliebte 2 Zimmer Wohnung
[2022/08/24 07:40:02|idmaintainer.py |DEBUG ]: is_processed(7917747028695013)
[2022/08/24 07:40:02|idmaintainer.py |DEBUG ]: is_processed(7044444671130838)
[2022/08/24 07:40:02|idmaintainer.py |DEBUG ]: is_processed(8824293547546021)
[2022/08/24 07:50:02|crawl_immobilienscout.py|DEBUG ]: Got search URL https://www.immobilienscout24.de/Suche/de/hamburg/hamburg/wohnung-mieten?numberofrooms=1.5-&price=-800.0&livingspace=30.0-&pricetype=rentpermonth&geocodes=0200000006057,0200000006058,0200000006059,0200000007070,0200000006084,0200000006073,0200000007076,0200000006075,0200000005054,0200000006055&sorting=2&pagenumber={0}
Traceback (most recent call last):
File "flathunt.py", line 105, in <module>
main()
File "flathunt.py", line 101, in main
launch_flat_hunt(config, heartbeat)
File "flathunt.py", line 38, in launch_flat_hunt
hunter.hunt_flats()
File "/usr/src/app/flathunter/hunter.py", line 54, in hunt_flats
for expose in processor_chain.process(self.crawl_for_exposes(max_pages)):
File "/usr/src/app/flathunter/hunter.py", line 34, in crawl_for_exposes
for searcher in self.config.searchers()
File "/usr/src/app/flathunter/hunter.py", line 35, in <listcomp>
for url in self.config.get('urls', [])])
File "/usr/src/app/flathunter/hunter.py", line 25, in try_crawl
return searcher.crawl(url, max_pages)
File "/usr/src/app/flathunter/abstract_crawler.py", line 158, in crawl
return self.get_results(url, max_pages)
File "/usr/src/app/flathunter/crawl_immobilienscout.py", line 55, in get_results
soup = self.get_page(search_url, self.driver, page_no)
File "/usr/src/app/flathunter/crawl_immobilienscout.py", line 128, in get_page
afterlogin_string=self.afterlogin_string
File "/usr/src/app/flathunter/abstract_crawler.py", line 88, in get_soup_from_url
driver.get(url)
File "/usr/local/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 447, in get
self.execute(Command.GET, {'url': url})
File "/usr/local/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 435, in execute
self.error_handler.check_response(response)
File "/usr/local/lib/python3.7/site-packages/selenium/webdriver/remote/errorhandler.py", line 247, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unknown error: session deleted because of page crash
from tab crashed
(Session info: headless chrome=104.0.5112.101)
Stacktrace:
#0 0x560b2c5ff403 <unknown>
#1 0x560b2c40564b <unknown>
#2 0x560b2c3f3b2d <unknown>
#3 0x560b2c3f3545 <unknown>
#4 0x560b2c3f2995 <unknown>
#5 0x560b2c3f20d4 <unknown>
#6 0x560b2c40c951 <unknown>
#7 0x560b2c46e078 <unknown>
#8 0x560b2c45a8f3 <unknown>
#9 0x560b2c4300d8 <unknown>
#10 0x560b2c431205 <unknown>
#11 0x560b2c646e3d <unknown>
#12 0x560b2c649db6 <unknown>
#13 0x560b2c63013e <unknown>
#14 0x560b2c64a9b5 <unknown>
#15 0x560b2c624970 <unknown>
#16 0x560b2c667228 <unknown>
#17 0x560b2c6673bf <unknown>
#18 0x560b2c681abe <unknown>
#19 0x7f2b269b1ea7 <unknown>
Can confirm @abuchmueller 's experience. Switched to docker using @squilaSC 's driver arguments and get the same Errors very irregularly (sometimes after 30min sometimes after 3 hours)
Thanks to docker's "restart unless stopped"-policy it just restarts the process and I can actually use it.
After rebooting it seems to work now, though. Maybe resolving the captcha failed?
Looking into the logs (docker logs -t <name>
) I can say that those crashes do not happen due to the captcha itself, but multiple minutes or hours later. So I guess they happen before the next captcha needs to be solved or something similar?
Edit:
2022-08-24T09:29:15.712743300Z [2022/08/24 09:29:15|twocaptcha_solver.py |INFO ]: Trying to solve geetest.
2022-08-24T09:29:15.879073118Z [2022/08/24 09:29:15|twocaptcha_solver.py |INFO ]: Captcha is not ready yet, waiting...
2022-08-24T09:29:20.963026882Z [2022/08/24 09:29:20|twocaptcha_solver.py |INFO ]: Captcha is not ready yet, waiting...
2022-08-24T09:29:26.045437341Z [2022/08/24 09:29:26|twocaptcha_solver.py |INFO ]: Captcha is not ready yet, waiting...
2022-08-24T09:29:31.120585833Z [2022/08/24 09:29:31|twocaptcha_solver.py |INFO ]: Captcha is not ready yet, waiting...
2022-08-24T09:29:36.218154553Z [2022/08/24 09:29:36|twocaptcha_solver.py |INFO ]: Captcha is not ready yet, waiting...
2022-08-24T09:29:41.302687462Z [2022/08/24 09:29:41|twocaptcha_solver.py |INFO ]: Captcha is not ready yet, waiting...
2022-08-24T09:29:46.389405839Z [2022/08/24 09:29:46|twocaptcha_solver.py |INFO ]: Captcha is not ready yet, waiting...
2022-08-24T09:29:51.462905713Z [2022/08/24 09:29:51|twocaptcha_solver.py |INFO ]: Captcha is not ready yet, waiting...
2022-08-24T09:29:56.536812746Z [2022/08/24 09:29:56|twocaptcha_solver.py |INFO ]: Captcha is not ready yet, waiting...
2022-08-24T09:30:01.609837660Z [2022/08/24 09:30:01|twocaptcha_solver.py |INFO ]: Captcha is not ready yet, waiting...
2022-08-24T09:50:20.670315785Z [2022/08/24 09:50:20|hunter.py |INFO ]: New offer: redacted
2022-08-24T09:50:20.929963807Z [2022/08/24 09:50:20|hunter.py |INFO ]: New offer: redacted
2022-08-24T09:50:21.163025163Z [2022/08/24 09:50:21|hunter.py |INFO ]: New offer: redacted
2022-08-24T10:00:23.587249546Z Traceback (most recent call last):
2022-08-24T10:00:23.587383206Z File "flathunt.py", line 105, in <module>
2022-08-24T10:00:23.587791048Z main()
2022-08-24T10:00:23.587881667Z File "flathunt.py", line 101, in main
2022-08-24T10:00:23.588307985Z launch_flat_hunt(config, heartbeat)
2022-08-24T10:00:23.588382834Z File "flathunt.py", line 38, in launch_flat_hunt
2022-08-24T10:00:23.588717841Z hunter.hunt_flats()
2022-08-24T10:00:23.588813640Z File "/usr/src/app/flathunter/hunter.py", line 54, in hunt_flats
2022-08-24T10:00:23.589107038Z for expose in processor_chain.process(self.crawl_for_exposes(max_pages)):
2022-08-24T10:00:23.589193119Z File "/usr/src/app/flathunter/hunter.py", line 34, in crawl_for_exposes
2022-08-24T10:00:23.589481068Z for searcher in self.config.searchers()
2022-08-24T10:00:23.589555297Z File "/usr/src/app/flathunter/hunter.py", line 35, in <listcomp>
2022-08-24T10:00:23.589852712Z for url in self.config.get('urls', [])])
2022-08-24T10:00:23.589924226Z File "/usr/src/app/flathunter/hunter.py", line 25, in try_crawl
2022-08-24T10:00:23.590209891Z return searcher.crawl(url, max_pages)
2022-08-24T10:00:23.590293677Z File "/usr/src/app/flathunter/abstract_crawler.py", line 158, in crawl
2022-08-24T10:00:23.590632441Z return self.get_results(url, max_pages)
2022-08-24T10:00:23.590717780Z File "/usr/src/app/flathunter/crawl_immobilienscout.py", line 59, in get_results
2022-08-24T10:00:23.591014675Z return self.get_entries_from_javascript()
2022-08-24T10:00:23.591089234Z File "/usr/src/app/flathunter/crawl_immobilienscout.py", line 90, in get_entries_from_javascript
2022-08-24T10:00:23.591414473Z result_json = self.driver.execute_script('return window.IS24.resultList;')
2022-08-24T10:00:23.591544866Z File "/usr/local/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 495, in execute_script
2022-08-24T10:00:23.591856990Z 'args': converted_args})['value']
2022-08-24T10:00:23.591868120Z File "/usr/local/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 435, in execute
2022-08-24T10:00:23.592033049Z self.error_handler.check_response(response)
2022-08-24T10:00:23.592067974Z File "/usr/local/lib/python3.7/site-packages/selenium/webdriver/remote/errorhandler.py", line 247, in check_response
2022-08-24T10:00:23.592252650Z raise exception_class(message, screen, stacktrace)
2022-08-24T10:00:23.592322380Z selenium.common.exceptions.WebDriverException: Message: unknown error: session deleted because of page crash
2022-08-24T10:00:23.592326828Z from unknown error: cannot determine loading status
2022-08-24T10:00:23.592329052Z from tab crashed
2022-08-24T10:00:23.592331166Z (Session info: headless chrome=104.0.5112.101)
2022-08-24T10:00:23.592333320Z Stacktrace:
2022-08-24T10:00:23.592335504Z #0 0x561c6c683403 <unknown>
2022-08-24T10:00:23.592337709Z #1 0x561c6c48964b <unknown>
2022-08-24T10:00:23.592339812Z #2 0x561c6c47675a <unknown>
2022-08-24T10:00:23.592341846Z #3 0x561c6c47565b <unknown>
2022-08-24T10:00:23.592349471Z #4 0x561c6c475c1c <unknown>
2022-08-24T10:00:23.592351635Z #5 0x561c6c481c3f <unknown>
2022-08-24T10:00:23.592353668Z #6 0x561c6c4827a2 <unknown>
2022-08-24T10:00:23.592355712Z #7 0x561c6c491076 <unknown>
2022-08-24T10:00:23.592357806Z #8 0x561c6c4f1ee1 <unknown>
2022-08-24T10:00:23.592359830Z #9 0x561c6c4de8f3 <unknown>
2022-08-24T10:00:23.592361894Z #10 0x561c6c4b40d8 <unknown>
2022-08-24T10:00:23.592363938Z #11 0x561c6c4b5205 <unknown>
2022-08-24T10:00:23.592365972Z #12 0x561c6c6cae3d <unknown>
2022-08-24T10:00:23.592367955Z #13 0x561c6c6cddb6 <unknown>
2022-08-24T10:00:23.592369909Z #14 0x561c6c6b413e <unknown>
2022-08-24T10:00:23.592371903Z #15 0x561c6c6ce9b5 <unknown>
2022-08-24T10:00:23.592374007Z #16 0x561c6c6a8970 <unknown>
2022-08-24T10:00:23.592376662Z #17 0x561c6c6eb228 <unknown>
2022-08-24T10:00:23.592379567Z #18 0x561c6c6eb3bf <unknown>
2022-08-24T10:00:23.592381962Z #19 0x561c6c705abe <unknown>
2022-08-24T10:00:23.592383955Z #20 0x7f49ec9ccea7 <unknown>
After taking a second look at the timestamps I saw that they are almost exactly 10min after the last successful search. Thus the crash seems to occur during crawling and for this special occurance there even was no captcha required. I did not get those crashes when I removed the Immoscout24 crawler. So I assume it is because of that.
Edit2:
According to stackoverflow (https://stackoverflow.com/questions/53902507/unknown-error-session-deleted-because-of-page-crash-from-unknown-error-cannot) the extra parameter --disable-dev-shm-usage
or increasing the dockers shm-size should solve this issue. Will try this later today
Under https://github.com/docker/cli/issues/1278 you find a way to persistently increase dockers ShmSize. I will try this approach today.
Edit3:
Runs for 24+ hours without crashing now. Seems to work :)
Edit3: Runs for 24+ hours without crashing now. Seems to work :)
🎉
On running the flathunter.service I get this error:
When I remove the
CrawlImmobilienscout(self)
from config.py everything works perfectly