flathunters / flathunter

A bot to help people with their rental real-estate search. 🏠🤖
GNU Affero General Public License v3.0
831 stars 179 forks source link

gCloud app not sending notifications #202

Closed abuchmueller closed 1 year ago

abuchmueller commented 2 years ago

On gcloud the bot runs, but sends no notifications.

I have

Logs:

2022-08-22 20:40:00 default[20220822t161725]  "GET /hunt HTTP/1.1" 201
2022-08-22 20:50:00 default[20220822t161725]  "GET /hunt HTTP/1.1" 201
2022-08-22 21:00:00 default[20220822t161725]  "GET /hunt HTTP/1.1" 201
2022-08-22 21:10:00 default[20220822t161725]  "GET /hunt HTTP/1.1" 201
2022-08-22 21:20:00 default[20220822t161725]  "GET /hunt HTTP/1.1" 201
2022-08-22 21:30:00 default[20220822t161725]  "GET /hunt HTTP/1.1" 201
2022-08-22 21:40:00 default[20220822t161725]  "GET /hunt HTTP/1.1" 201
2022-08-22 21:50:00 default[20220822t161725]  "GET /hunt HTTP/1.1" 201
2022-08-22 22:00:00 default[20220822t161725]  "GET /hunt HTTP/1.1" 201
2022-08-22 22:10:00 default[20220822t161725]  "GET /hunt HTTP/1.1" 201
2022-08-22 22:20:00 default[20220822t161725]  "GET /hunt HTTP/1.1" 201
2022-08-22 22:30:00 default[20220822t161725]  "GET /hunt HTTP/1.1" 201
2022-08-22 22:40:00 default[20220822t161725]  "GET /hunt HTTP/1.1" 201
2022-08-22 22:50:00 default[20220822t161725]  "GET /hunt HTTP/1.1" 201
2022-08-22 23:00:00 default[20220822t161725]  "GET /hunt HTTP/1.1" 201
2022-08-22 23:10:00 default[20220822t161725]  "GET /hunt HTTP/1.1" 201
2022-08-22 23:20:00 default[20220822t161725]  "GET /hunt HTTP/1.1" 201
2022-08-22 23:30:00 default[20220822t161725]  "GET /hunt HTTP/1.1" 201
2022-08-22 23:40:00 default[20220822t161725]  "GET /hunt HTTP/1.1" 201
2022-08-22 23:50:00 default[20220822t161725]  "GET /hunt HTTP/1.1" 201
2022-08-23 00:00:00 default[20220822t161725]  "GET /hunt HTTP/1.1" 201
2022-08-23 00:09:16 default[20220822t161725]  [2022-08-23 00:09:16 +0000] [10] [INFO] Handling signal: term
2022-08-23 00:09:17 default[20220822t161725]  [2022-08-23 00:09:17 +0000] [21] [INFO] Worker exiting (pid: 21)
2022-08-23 00:09:17 default[20220822t161725]  [2022-08-23 00:09:17 +0000] [24] [INFO] Worker exiting (pid: 24)
2022-08-23 00:09:18 default[20220822t161725]  [2022-08-23 00:09:18 +0000] [10] [INFO] Shutting down: Master
2022-08-23 00:10:00 default[20220822t161725]  "GET /hunt HTTP/1.1" 201
2022-08-23 00:10:01 default[20220822t161725]  [2022-08-23 00:10:01 +0000] [10] [INFO] Starting gunicorn 20.1.0
2022-08-23 00:10:01 default[20220822t161725]  [2022-08-23 00:10:01 +0000] [10] [INFO] Listening at: http://0.0.0.0:8081 (10)
2022-08-23 00:10:01 default[20220822t161725]  [2022-08-23 00:10:01 +0000] [10] [INFO] Using worker: gthread
2022-08-23 00:10:01 default[20220822t161725]  [2022-08-23 00:10:01 +0000] [20] [INFO] Booting worker with pid: 20
2022-08-23 00:10:01 default[20220822t161725]  [2022-08-23 00:10:01 +0000] [23] [INFO] Booting worker with pid: 23
2022-08-23 00:10:05 default[20220822t161725]  [2022/08/23 00:10:05|config.py               |INFO    ]: Using config /srv/flathunter/../config.yaml
2022-08-23 00:10:05 default[20220822t161725]  [2022/08/23 00:10:05|config.py               |INFO    ]: Using config /srv/flathunter/../config.yaml
2022-08-23 00:10:05 default[20220822t161725]  [2022/08/23 00:10:05|config.py               |INFO    ]: Using config /srv/flathunter/../config.yaml
2022-08-23 00:10:05 default[20220822t161725]  [2022/08/23 00:10:05|config.py               |INFO    ]: Using config /srv/flathunter/../config.yaml
2022-08-23 00:20:00 default[20220822t161725]  "GET /hunt HTTP/1.1" 201
2022-08-23 00:30:00 default[20220822t161725]  "GET /hunt HTTP/1.1" 201
2022-08-23 00:40:00 default[20220822t161725]  "GET /hunt HTTP/1.1" 201
2022-08-23 00:50:00 default[20220822t161725]  "GET /hunt HTTP/1.1" 201
2022-08-23 01:00:00 default[20220822t161725]  "GET /hunt HTTP/1.1" 201
2022-08-23 01:10:00 default[20220822t161725]  "GET /hunt HTTP/1.1" 201
2022-08-23 01:20:00 default[20220822t161725]  "GET /hunt HTTP/1.1" 201
2022-08-23 01:30:00 default[20220822t161725]  "GET /hunt HTTP/1.1" 201
2022-08-23 01:40:00 default[20220822t161725]  "GET /hunt HTTP/1.1" 201
2022-08-23 01:50:00 default[20220822t161725]  "GET /hunt HTTP/1.1" 201
2022-08-23 02:00:00 default[20220822t161725]  "GET /hunt HTTP/1.1" 201
2022-08-23 02:10:00 default[20220822t161725]  "GET /hunt HTTP/1.1" 201
2022-08-23 02:20:00 default[20220822t161725]  "GET /hunt HTTP/1.1" 201
2022-08-23 02:30:00 default[20220822t161725]  "GET /hunt HTTP/1.1" 201
2022-08-23 02:40:00 default[20220822t161725]  "GET /hunt HTTP/1.1" 201
2022-08-23 02:50:00 default[20220822t161725]  "GET /hunt HTTP/1.1" 201
2022-08-23 03:00:00 default[20220822t161725]  "GET /hunt HTTP/1.1" 201
2022-08-23 03:10:00 default[20220822t161725]  "GET /hunt HTTP/1.1" 201
2022-08-23 03:20:00 default[20220822t161725]  "GET /hunt HTTP/1.1" 201
2022-08-23 03:30:00 default[20220822t161725]  "GET /hunt HTTP/1.1" 201
2022-08-23 03:40:00 default[20220822t161725]  "GET /hunt HTTP/1.1" 201
2022-08-23 03:50:00 default[20220822t161725]  "GET /hunt HTTP/1.1" 201
2022-08-23 04:00:00 default[20220822t161725]  "GET /hunt HTTP/1.1" 201
2022-08-23 04:10:00 default[20220822t161725]  "GET /hunt HTTP/1.1" 201
2022-08-23 04:20:00 default[20220822t161725]  "GET /hunt HTTP/1.1" 201
2022-08-23 04:30:00 default[20220822t161725]  "GET /hunt HTTP/1.1" 201
2022-08-23 04:40:00 default[20220822t161725]  "GET /hunt HTTP/1.1" 201
2022-08-23 04:50:00 default[20220822t161725]  "GET /hunt HTTP/1.1" 201
2022-08-23 05:00:00 default[20220822t161725]  "GET /hunt HTTP/1.1" 201
2022-08-23 05:10:00 default[20220822t161725]  "GET /hunt HTTP/1.1" 201
2022-08-23 05:20:00 default[20220822t161725]  "GET /hunt HTTP/1.1" 201
2022-08-23 05:30:00 default[20220822t161725]  "GET /hunt HTTP/1.1" 201
2022-08-23 05:40:00 default[20220822t161725]  "GET /hunt HTTP/1.1" 201
2022-08-23 05:50:00 default[20220822t161725]  "GET /hunt HTTP/1.1" 201
2022-08-23 06:00:00 default[20220822t161725]  "GET /hunt HTTP/1.1" 201
2022-08-23 06:10:00 default[20220822t161725]  "GET /hunt HTTP/1.1" 201
2022-08-23 06:20:00 default[20220822t161725]  "GET /hunt HTTP/1.1" 201
2022-08-23 06:30:00 default[20220822t161725]  "GET /hunt HTTP/1.1" 201
2022-08-23 06:40:00 default[20220822t161725]  "GET /hunt HTTP/1.1" 201
2022-08-23 06:50:00 default[20220822t161725]  "GET /hunt HTTP/1.1" 201
2022-08-23 07:00:00 default[20220822t161725]  "GET /hunt HTTP/1.1" 201
2022-08-23 07:10:00 default[20220822t161725]  "GET /hunt HTTP/1.1" 201
2022-08-23 07:20:00 default[20220822t161725]  "GET /hunt HTTP/1.1" 201
2022-08-23 07:30:00 default[20220822t161725]  "GET /hunt HTTP/1.1" 201
2022-08-23 07:40:00 default[20220822t161725]  "GET /hunt HTTP/1.1" 201
2022-08-23 07:50:00 default[20220822t161725]  "GET /hunt HTTP/1.1" 201
2022-08-23 08:00:00 default[20220822t161725]  "GET /hunt HTTP/1.1" 201
2022-08-23 08:02:16 default[20220822t161725]  "GET / HTTP/1.1" 200
2022-08-23 08:02:16 default[20220822t161725]  "GET /static/style.css HTTP/1.1" 304
2022-08-23 08:02:16 default[20220822t161725]  "GET /static/GitHub-Mark-32px.png HTTP/1.1" 304
2022-08-23 08:02:16 default[20220822t161725]  "GET /static/site.webmanifest HTTP/1.1" 304
2022-08-23 08:10:00 default[20220822t161725]  "GET /hunt HTTP/1.1" 201
2022-08-23 08:20:00 default[20220822t161725]  "GET /hunt HTTP/1.1" 201
2022-08-23 08:30:00 default[20220822t161725]  "GET /hunt HTTP/1.1" 201

I use the same config locally as well, and it works without issues. Logger is set to Debug but this is all I can get from gcloud app logs read.

codders commented 2 years ago

k. I took a look at this. I've created a fix in #204 that I will merge in a bit

codders commented 2 years ago

Please try with the latest code on master. Debug should also be working now

abuchmueller commented 2 years ago

I've pulled from master but all this update this was break the app completely. Can't reach it anymore, getting 502 Bad Gateway.

2022-08-26 13:35:19 default[20220826t152650]  "GET / HTTP/1.1" 502
2022-08-26 13:35:19 default[20220826t152650]  [2022-08-26 13:35:19 +0000] [11] [INFO] Starting gunicorn 20.1.0
2022-08-26 13:35:19 default[20220826t152650]  [2022-08-26 13:35:19 +0000] [11] [INFO] Listening at: http://0.0.0.0:8081 (11)
2022-08-26 13:35:19 default[20220826t152650]  [2022-08-26 13:35:19 +0000] [11] [INFO] Using worker: gthread
2022-08-26 13:35:19 default[20220826t152650]  [2022-08-26 13:35:19 +0000] [20] [INFO] Booting worker with pid: 20
2022-08-26 13:35:19 default[20220826t152650]  [2022-08-26 13:35:19 +0000] [22] [INFO] Booting worker with pid: 22
2022-08-26 13:35:23 default[20220826t152650]  [2022/08/26 13:35:23|config.py               |INFO    ]: Using config /srv/flathunter/../config.yaml
2022-08-26 13:35:23 default[20220826t152650]  [2022/08/26 13:35:23|config.py               |INFO    ]: Using config /srv/flathunter/../config.yaml
2022-08-26 13:35:23 default[20220826t152650]  [2022/08/26 13:35:23|config.py               |INFO    ]: Using config /srv/flathunter/../config.yaml
2022-08-26 13:35:23 default[20220826t152650]  [2022/08/26 13:35:23|config.py               |INFO    ]: Using config /srv/flathunter/../config.yaml
2022-08-26 13:35:23 default[20220826t152650]  [2022/08/26 13:35:23|abstract_crawler.py     |INFO    ]: Initializing Chrome WebDriver for crawler "CrawlImmobilienscout"...
2022-08-26 13:35:23 default[20220826t152650]  [2022/08/26 13:35:23|abstract_crawler.py     |INFO    ]: Initializing Chrome WebDriver for crawler "CrawlImmobilienscout"...
2022-08-26 13:35:23 default[20220826t152650]  [2022/08/26 13:35:23|<WebDriverManager>      |DEBUG   ]: ====== WebDriver manager ======
2022-08-26 13:35:23 default[20220826t152650]  [2022/08/26 13:35:23|<WebDriverManager>      |DEBUG   ]: ====== WebDriver manager ======
2022-08-26 13:35:23 default[20220826t152650]  [2022/08/26 13:35:23|<WebDriverManager>      |DEBUG   ]: Get LATEST chromedriver version for google-chrome None
2022-08-26 13:35:23 default[20220826t152650]  [2022/08/26 13:35:23|<WebDriverManager>      |DEBUG   ]: Get LATEST chromedriver version for google-chrome None
2022-08-26 13:35:24 default[20220826t152650]  [2022/08/26 13:35:24|<WebDriverManager>      |DEBUG   ]: There is no [linux64] chromedriver for browser None in cache
2022-08-26 13:35:24 default[20220826t152650]  [2022/08/26 13:35:24|<WebDriverManager>      |DEBUG   ]: About to download new driver from https://chromedriver.storage.googleapis.com/104.0.5112.79/chromedriver_linux64.zip
2022-08-26 13:35:24 default[20220826t152650]  [2022/08/26 13:35:24|<WebDriverManager>      |DEBUG   ]: There is no [linux64] chromedriver for browser None in cache
2022-08-26 13:35:24 default[20220826t152650]  [2022/08/26 13:35:24|<WebDriverManager>      |DEBUG   ]: About to download new driver from https://chromedriver.storage.googleapis.com/104.0.5112.79/chromedriver_linux64.zip
2022-08-26 13:35:24 default[20220826t152650]
2022-08-26 13:35:24 default[20220826t152650]  [WDM] - Downloading:   0%|          | 0.00/6.74M [00:00<?, ?B/s]
2022-08-26 13:35:24 default[20220826t152650]  [WDM] - Downloading:   0%|          | 0.00/6.74M [00:00<?, ?B/s]
2022-08-26 13:35:24 default[20220826t152650]  [WDM] - Downloading:  94%|█████████▍| 6.35M/6.74M [00:00<00:00, 66.4MB/s]
2022-08-26 13:35:24 default[20220826t152650]  [WDM] - Downloading:  42%|████▏     | 2.85M/6.74M [00:00<00:00, 29.7MB/s]
2022-08-26 13:35:24 default[20220826t152650]  [WDM] - Downloading: 100%|██████████| 6.74M/6.74M [00:00<00:00, 58.0MB/s]
2022-08-26 13:35:24 default[20220826t152650]
2022-08-26 13:35:24 default[20220826t152650]  [WDM] - Downloading: 100%|██████████| 6.74M/6.74M [00:00<00:00, 35.8MB/s]
2022-08-26 13:35:24 default[20220826t152650]  [2022/08/26 13:35:24|<WebDriverManager>      |DEBUG   ]: Driver has been saved in cache [/root/.wdm/drivers/chromedriver/linux64/104.0.5112]
2022-08-26 13:35:24 default[20220826t152650]  [2022/08/26 13:35:24|<WebDriverManager>      |DEBUG   ]: Driver has been saved in cache [/root/.wdm/drivers/chromedriver/linux64/104.0.5112]
2022-08-26 13:35:24 default[20220826t152650]  [2022-08-26 13:35:24 +0000] [20] [ERROR] Exception in worker process
2022-08-26 13:35:24 default[20220826t152650]  Traceback (most recent call last):    File "/layers/google.python.pip/pip/lib/python3.7/site-packages/gunicorn/arbiter.py", line 589, in spawn_worker      worker.init_process()    File "/layers/google.python.pip/pip/lib/python3.7/site-packages/gunicorn/workers/gthread.py", line 92, in init_process      super().init_process()    File "/layers/google.python.pip/pip/lib/python3.7/site-packages/gunicorn/workers/base.py", line 134, in init_process      self.load_wsgi()    File "/layers/google.python.pip/pip/lib/python3.7/site-packages/gunicorn/workers/base.py", line 146, in load_wsgi      self.wsgi = self.app.wsgi()    File "/layers/google.python.pip/pip/lib/python3.7/site-packages/gunicorn/app/base.py", line 67, in wsgi      self.callable = self.load()    File "/layers/google.python.pip/pip/lib/python3.7/site-packages/gunicorn/app/wsgiapp.py", line 58, in load      return self.load_wsgiapp()    File "/layers/google.python.pip/pip/lib/python3.7/site-packages/gunicorn/app/wsgiapp.py", line 48, in load_wsgiapp      return util.import_app(self.app_uri)    File "/layers/google.python.pip/pip/lib/python3.7/site-packages/gunicorn/util.py", line 359, in import_app      mod = importlib.import_module(module)    File "/opt/python3.7/lib/python3.7/importlib/__init__.py", line 127, in import_module      return _bootstrap._gcd_import(name[level:], package, level)    File "<frozen importlib._bootstrap>", line 1006, in _gcd_import    File "<frozen importlib._bootstrap>", line 983, in _find_and_load    File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked    File "<frozen importlib._bootstrap>", line 677, in _load_unlocked    File "<frozen importlib._bootstrap_external>", line 728, in exec_module    File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed    File "/srv/main.py", line 28, in <module>      config.init_searchers()    File "/srv/flathunter/config.py", line 44, in init_searchers      CrawlImmobilienscout(self),    File "/srv/flathunter/crawl_immobilienscout.py", line 38, in __init__      self.driver = self.configure_driver(driver_arguments)    File "/srv/flathunter/abstract_crawler.py", line 61, in configure_driver      options=chrome_options    File "/layers/google.python.pip/pip/lib/python3.7/site-packages/selenium/webdriver/chrome/webdriver.py", line 72, in __init__      service_log_path, service, keep_alive)    File "/layers/google.python.pip/pip/lib/python3.7/site-packages/selenium/webdriver/chromium/webdriver.py", line 97, in __init__      options=options)    File "/layers/google.python.pip/pip/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 270, in __init__      self.start_session(capabilities, browser_profile)    File "/layers/google.python.pip/pip/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 363, in start_session      response = self.execute(Command.NEW_SESSION, parameters)    File "/layers/google.python.pip/pip/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 428, in execute      self.error_handler.check_response(response)    File "/layers/google.python.pip/pip/lib/python3.7/site-packages/selenium/webdriver/remote/errorhandler.py", line 243, in check_response      raise exception_class(message, screen, stacktrace)  selenium.common.exceptions.WebDriverException: Message: unknown error: cannot find Chrome binary
2022-08-26 13:35:24 default[20220826t152650]  Stacktrace:
2022-08-26 13:35:24 default[20220826t152650]  #0 0x2a452f440403 <unknown>
2022-08-26 13:35:24 default[20220826t152650]  #1 0x2a452f246778 <unknown>
2022-08-26 13:35:24 default[20220826t152650]  #2 0x2a452f268916 <unknown>
2022-08-26 13:35:24 default[20220826t152650]  #3 0x2a452f26612b <unknown>
2022-08-26 13:35:24 default[20220826t152650]  #4 0x2a452f2a183a <unknown>
2022-08-26 13:35:24 default[20220826t152650]  #5 0x2a452f29b8f3 <unknown>
2022-08-26 13:35:24 default[20220826t152650]  #6 0x2a452f2710d8 <unknown>
2022-08-26 13:35:24 default[20220826t152650]  #7 0x2a452f272205 <unknown>
2022-08-26 13:35:24 default[20220826t152650]  #8 0x2a452f487e3d <unknown>
2022-08-26 13:35:24 default[20220826t152650]  #9 0x2a452f48adb6 <unknown>
2022-08-26 13:35:24 default[20220826t152650]  #10 0x2a452f47113e <unknown>
2022-08-26 13:35:24 default[20220826t152650]  #11 0x2a452f48b9b5 <unknown>
2022-08-26 13:35:24 default[20220826t152650]  #12 0x2a452f465970 <unknown>
2022-08-26 13:35:24 default[20220826t152650]  #13 0x2a452f4a8228 <unknown>
2022-08-26 13:35:24 default[20220826t152650]  #14 0x2a452f4a83bf <unknown>
2022-08-26 13:35:24 default[20220826t152650]  #15 0x2a452f4c2abe <unknown>
2022-08-26 13:35:24 default[20220826t152650]  #16 0x3ea915e076db <unknown>
2022-08-26 13:35:24 default[20220826t152650]
2022-08-26 13:35:24 default[20220826t152650]  [2022-08-26 13:35:24 +0000] [20] [INFO] Worker exiting (pid: 20)
2022-08-26 13:35:25 default[20220826t152650]  [2022-08-26 13:35:25 +0000] [22] [ERROR] Exception in worker process
2022-08-26 13:35:25 default[20220826t152650]  Traceback (most recent call last):    File "/layers/google.python.pip/pip/lib/python3.7/site-packages/gunicorn/arbiter.py", line 589, in spawn_worker      worker.init_process()    File "/layers/google.python.pip/pip/lib/python3.7/site-packages/gunicorn/workers/gthread.py", line 92, in init_process      super().init_process()    File "/layers/google.python.pip/pip/lib/python3.7/site-packages/gunicorn/workers/base.py", line 134, in init_process      self.load_wsgi()    File "/layers/google.python.pip/pip/lib/python3.7/site-packages/gunicorn/workers/base.py", line 146, in load_wsgi      self.wsgi = self.app.wsgi()    File "/layers/google.python.pip/pip/lib/python3.7/site-packages/gunicorn/app/base.py", line 67, in wsgi      self.callable = self.load()    File "/layers/google.python.pip/pip/lib/python3.7/site-packages/gunicorn/app/wsgiapp.py", line 58, in load      return self.load_wsgiapp()    File "/layers/google.python.pip/pip/lib/python3.7/site-packages/gunicorn/app/wsgiapp.py", line 48, in load_wsgiapp      return util.import_app(self.app_uri)    File "/layers/google.python.pip/pip/lib/python3.7/site-packages/gunicorn/util.py", line 359, in import_app      mod = importlib.import_module(module)    File "/opt/python3.7/lib/python3.7/importlib/__init__.py", line 127, in import_module      return _bootstrap._gcd_import(name[level:], package, level)    File "<frozen importlib._bootstrap>", line 1006, in _gcd_import    File "<frozen importlib._bootstrap>", line 983, in _find_and_load    File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked    File "<frozen importlib._bootstrap>", line 677, in _load_unlocked    File "<frozen importlib._bootstrap_external>", line 728, in exec_module    File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed    File "/srv/main.py", line 28, in <module>      config.init_searchers()    File "/srv/flathunter/config.py", line 44, in init_searchers      CrawlImmobilienscout(self),    File "/srv/flathunter/crawl_immobilienscout.py", line 38, in __init__      self.driver = self.configure_driver(driver_arguments)    File "/srv/flathunter/abstract_crawler.py", line 61, in configure_driver      options=chrome_options    File "/layers/google.python.pip/pip/lib/python3.7/site-packages/selenium/webdriver/chrome/webdriver.py", line 72, in __init__      service_log_path, service, keep_alive)    File "/layers/google.python.pip/pip/lib/python3.7/site-packages/selenium/webdriver/chromium/webdriver.py", line 97, in __init__      options=options)    File "/layers/google.python.pip/pip/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 270, in __init__      self.start_session(capabilities, browser_profile)    File "/layers/google.python.pip/pip/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 363, in start_session      response = self.execute(Command.NEW_SESSION, parameters)    File "/layers/google.python.pip/pip/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 428, in execute      self.error_handler.check_response(response)    File "/layers/google.python.pip/pip/lib/python3.7/site-packages/selenium/webdriver/remote/errorhandler.py", line 243, in check_response      raise exception_class(message, screen, stacktrace)  selenium.common.exceptions.WebDriverException: Message: unknown error: cannot find Chrome binary
2022-08-26 13:35:25 default[20220826t152650]  Stacktrace:
2022-08-26 13:35:25 default[20220826t152650]  #0 0x2a6a2d151403 <unknown>
2022-08-26 13:35:25 default[20220826t152650]  #1 0x2a6a2cf57778 <unknown>
2022-08-26 13:35:25 default[20220826t152650]  #2 0x2a6a2cf79916 <unknown>
2022-08-26 13:35:25 default[20220826t152650]  #3 0x2a6a2cf7712b <unknown>
2022-08-26 13:35:25 default[20220826t152650]  #4 0x2a6a2cfb283a <unknown>
2022-08-26 13:35:25 default[20220826t152650]  #5 0x2a6a2cfac8f3 <unknown>
2022-08-26 13:35:25 default[20220826t152650]  #6 0x2a6a2cf820d8 <unknown>
2022-08-26 13:35:25 default[20220826t152650]  #7 0x2a6a2cf83205 <unknown>
2022-08-26 13:35:25 default[20220826t152650]  #8 0x2a6a2d198e3d <unknown>
2022-08-26 13:35:25 default[20220826t152650]  #9 0x2a6a2d19bdb6 <unknown>
2022-08-26 13:35:25 default[20220826t152650]  #10 0x2a6a2d18213e <unknown>
2022-08-26 13:35:25 default[20220826t152650]  #11 0x2a6a2d19c9b5 <unknown>
2022-08-26 13:35:25 default[20220826t152650]  #12 0x2a6a2d176970 <unknown>
2022-08-26 13:35:25 default[20220826t152650]  #13 0x2a6a2d1b9228 <unknown>
2022-08-26 13:35:25 default[20220826t152650]  #14 0x2a6a2d1b93bf <unknown>
2022-08-26 13:35:25 default[20220826t152650]  #15 0x2a6a2d1d3abe <unknown>
2022-08-26 13:35:25 default[20220826t152650]  #16 0x3e4630a076db <unknown>
2022-08-26 13:35:25 default[20220826t152650]
2022-08-26 13:35:25 default[20220826t152650]  [2022-08-26 13:35:25 +0000] [22] [INFO] Worker exiting (pid: 22)
2022-08-26 13:35:25 default[20220826t152650]  [2022-08-26 13:35:25 +0000] [11] [WARNING] Worker with pid 22 was terminated due to signal 15
2022-08-26 13:35:25 default[20220826t152650]  [2022-08-26 13:35:25 +0000] [11] [INFO] Shutting down: Master
2022-08-26 13:35:25 default[20220826t152650]  [2022-08-26 13:35:25 +0000] [11] [INFO] Reason: Worker failed to boot.

the exception is actually quite similar to #199, with Immobilienscout crashing

codders commented 2 years ago

Are you deploying with Google App Engine? Or Google Cloud Run?

abuchmueller commented 2 years ago

Google App Engine

codders commented 2 years ago

Okay. So the thing is... Google App Engine just deploys the python libraries, and it's not possible (as far as I've seen) to get Google Chrome installed as part of that process. So you can deploy to Google App Engine, as long as you don't want to use Immoscout or 2captcha/imagetyperz. I'm currently (as in this second) working on getting a deployment working with Google Cloud Run, which will deploy a docker container that includes Google Chrome. This seems to work better - I'll post some updates when I have them. But of course this might turn out more expensive than App Engine

abuchmueller commented 2 years ago

GAE is cool but since it works flawlessly in Docker (after #199), I though it might be better to focus on that approach since the Webinterface isn't that useful at all because and you still need to configure quite a bit in the config.yaml and elsewhere.

Deploying the docker container to a VPS works flawlessly afaik. Currently I'm trying to deploy it as a Google Clound Run Job myself but I'm a gcp noob. I've looked at it yesterday, but couldn't figure it out yet, apart from a way to mount the config.yaml (which can be circumvented if you remove it from the .dockerignore and just ship it with the image), I have yet to find a way to persist data in google cloud run…

codders commented 2 years ago

For me, the webinterface is critical, as it makes it possible for non-developers to use the service. I have a public instance at https://flathunter.codders.io that hundreds of strangers have been using over the years (and there are about 30 people receiving notifications from it right now), and I'm trying to get it updated to also scrape immoscout.

The flathunter software already supports persisting data to Firebase, and I'm kinda hoping that will still work in GCR, but right now I'm also stuck just getting the thing to behave well after it launches. My hope is GCR turns out cheaper than a VPS. But if I don't get this working in the next days then I guess I'll head that route.

abuchmueller commented 2 years ago

For me, the webinterface is critical, as it makes it possible for non-developers to use the service. I have a public instance at flathunter.codders.io that hundreds of strangers have been using over the years (and there are about 30 people receiving notifications from it right now), and I'm trying to get it updated to also scrape immoscout.

Ah ok I see, you limited the search url's to Berlin. I was also thinking about providing this as a service for friends and family but I found the web interface not so useful for my purpose because you could not configure the search url's through it per telegram receiver id so I would need to do use the config.yaml anyway and deploy multiple services. Fyi, locally the web interface works well. Just tried on localhost, it crawls Immobilienscout and sends notifications. So this is more of a App Engine issue.

So you can deploy to Google App Engine, as long as you don't want to use Immoscout or 2captcha/imagetyperz. I'm currently (as in this second) working on getting a deployment working with Google Cloud Run, which will deploy a docker container that includes Google Chrome.

Maybe the solution to this issue is to mention the limitations of the GAE approach in the readme and work on alternative deployment solutions like Cloud Run or AWS EC2 (which has more hours in the free tier than GCP)

Thanks for all the hard work!

codders commented 2 years ago

Okay. I got this kinda running now on GCR. The nice thing about GCR is that I'm only paying while the site is running - about 2 minutes in every 10 - which I'm hoping means that I'll stay in the free tier in Google Cloud (or at least it'll be very cheap).

But what I learned so far is that it doesn't run with less than 1GB of RAM, and I have the problems you already identified about shipping the config.yaml in the docker image (plus some other GCR issues).

I'll open a branch shortly with the changes and then we (@abuchmueller, @codders, @alexanderroidl ?) can collaborate on how to merge it in a way that isn't total trash.

Bis morgen!

alexanderroidl commented 2 years ago

@codders Thank you for the hard work! It's amazing that you got it running on GCR. By the way, what do you mean by "kinda" having it running?

I'd be happy to collaborate on that.

codders commented 2 years ago

Kinda - I mean, it runs okay and crawls immoscout (and the rest) and puts the data in Firebase, and the website loads. But there's some things that are not ideal.

  1. I turned the concurrency right down to try and debug what was happening, so I have gunicorn with 1 worker and 1 thread right now.

  2. Shipping the config file with the docker container sucks. It would be great if we would extend config.py so that we can pass all the important variables through the environment. And maybe also integrate dotenv for development / testing purposes.

  3. I added a bunch of driver arguments to make chrome run headless without crashing. I don't know which ones are important and which not.

But if you already want to take a look at dotenv support, that would be super helpful. I'm on the road today so I can't push my changes, but it was just a couple of lines in the Dockerfile and the driver_arguments in the config file so far (plus the GCR config).

codders commented 2 years ago

Okay - I added a Work-in-progress PR #208 so you can see where I'm headed. There's a refactor of the config processing to handle passing config in the environment, and I'm splitting the 'hunt' / crawl activity from the web-serving activity. There's no reason the webserver has to have 1GB of memory and google chrome embedded just to let people setup and configure their Telegram notifications.

Let me know what you think!

codders commented 2 years ago

fyi - that's merged to main now. You can just pull the latest code and read the docs to try that out.

abuchmueller commented 2 years ago

fyi - that's merged to main now. You can just pull the latest code and read the docs to try that out.

thanks a ton! I'll try it out on saturday and report back! I haven't had the time lately to dive more into this

abuchmueller commented 2 years ago

fyi - that's merged to main now. You can just pull the latest code and read the docs to try that out.

Very Nice! As promised I report back. I've tried it out and could get it running. However I noticed the following:

Also a bit confused on the timer: What does FLATHUNTER_LOOP_PERIOD_SECONDS actually do if the job is invoked via cloud scheduler just wait if it is invoked too early?

abuchmueller commented 1 year ago

Update: I reverted the version on the Google App Engine to an older build (pre #208) and it works now with in conjunction with the Gcloud Run Job, so that the UI is updated once the job runs. If you ask me, the decoupling of crawling and webserver is a really nice addition, I would love to see the job be able to calculate distances via distance matrix API. It's cost efficient too, right now the billing forecasts lies arround 0.25$for the month, which is def cheaper than a VPS (3-4$/month).

codders commented 1 year ago

@abuchmueller - thanks for the report!

abuchmueller commented 1 year ago

@abuchmueller - thanks for the report!

  • Re FLATHUNTER_MESSAGE_FORMAT, there is a default value, so you don't need to specify it unless you want to change the format. But sure, we could add some better documentation for that.

  • I haven't used the gcloud distance matrix API. It shouldn't be hard to make it run, but you would need to add the environment variable configuration to pass in the API key. I didn't know anybody was using that, so I haven't looked at / tested it.

  • Good point with the scheduling - I'll add a note about that.

  • For GAE - I'll try redeploying that for myself when I get a chance and see what's up there. Maybe I broke something

  • FLATHUNTER_LOOP_PERIOD_SECONDS will not do anything if you're running the web interface or the gcloud job. You don't need to specify a value for it - just leave it blank.

    How is your python coding? Do you want to have a go at getting the distance matrix API code running?

I love the distance matrix feature, its the only reason I still run docker locally. I'll take a look at your PR and have a go at it!

codders commented 1 year ago

Closing this ticket as the original issue is resolved. Please open a new ticket for the other items if you still want to make changes there. Thanks!