arachnys / cabot

Self-hosted, easily-deployable monitoring and alerts service - like a lightweight PagerDuty
MIT License
5.59k stars 593 forks source link

OperationalError: could not connect to server: Operation timed out #663

Closed liangerg closed 5 years ago

liangerg commented 5 years ago

After cabot ran over one or two days, the following ERROR occurs and the cabot is not working and needs to re-start....

INFO 2019-03-13 14:24:27,639 basehttp 22 140457922218672 "GET /static/CACHE/css/base.d33ed15511ca.css HTTP/1.1" 200 644 ERROR 2019-03-13 14:26:37,237 exception 22 140457778928304 Internal Server Error: / Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/django/core/handlers/exception.py", line 41, in inner response = get_response(request) File "/usr/lib/python2.7/site-packages/django/core/handlers/base.py", line 249, in _legacy_get_response response = self._get_response(request) File "/usr/lib/python2.7/site-packages/django/core/handlers/base.py", line 187, in _get_response response = self.process_exception_by_middleware(e, request) File "/usr/lib/python2.7/site-packages/django/core/handlers/base.py", line 185, in _get_response response = wrapped_callback(request, *callback_args, callback_kwargs) File "/code/cabot/urls.py", line 49, in home_authentication_switcher if cabot_needs_setup(): File "/code/cabot/cabotapp/utils.py", line 5, in cabot_needs_setup return not get_user_model().objects.all().exists() File "/usr/lib/python2.7/site-packages/django/db/models/query.py", line 670, in exists File "/usr/lib/python2.7/site-packages/django/db/models/query.py", line 670, in exists return self.query.has_results(using=self.db) File "/usr/lib/python2.7/site-packages/django/db/models/sql/query.py", line 517, in has_results return compiler.has_results() File "/usr/lib/python2.7/site-packages/django/db/models/sql/compiler.py", line 858, in has_results return bool(self.execute_sql(SINGLE)) File "/usr/lib/python2.7/site-packages/django/db/models/sql/compiler.py", line 887, in execute_sql cursor = self.connection.cursor() File "/usr/lib/python2.7/site-packages/django/db/backends/base/base.py", line 254, in cursor return self._cursor() File "/usr/lib/python2.7/site-packages/django/db/backends/base/base.py", line 229, in _cursor self.ensure_connection() File "/usr/lib/python2.7/site-packages/django/db/backends/base/base.py", line 213, in ensure_connection self.connect() File "/usr/lib/python2.7/site-packages/django/db/utils.py", line 94, in exit six.reraise(dj_exc_type, dj_exc_value, traceback) File "/usr/lib/python2.7/site-packages/django/db/backends/base/base.py", line 213, in ensure_connection self.connect() File "/usr/lib/python2.7/site-packages/django/db/backends/base/base.py", line 189, in connect self.connection = self.get_new_connection(conn_params) File "/usr/lib/python2.7/site-packages/django/db/backends/postgresql/base.py", line 176, in get_new_connection connection = Database.connect(conn_params) File "/usr/lib/python2.7/site-packages/psycopg2/init.py", line 130, in connect conn = _connect(dsn, connection_factory=connection_factory, **kwasync) OperationalError: could not connect to server: Operation timed out Is the server running on host "db" (172.18.0.2) and accepting TCP/IP connections on port 5432?

Can someone help?

Thank you!

dbuxton commented 5 years ago

Your database has crashed or is not accepting new connections?

liangerg commented 5 years ago

db is not crash but cabot_beat is down after 21 hours: rundeck@rundeck:~$ docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 1dc2472aee6e cabot:web "./docker-entrypoint…" 21 hours ago Up 21 hours 10.209.71.97:5050->5001/tcp cabot_web_1 cfaea01d75b0 cabot:web "./docker-entrypoint…" 21 hours ago Up 21 hours cabot_worker_1 a9e153e33776 postgres:alpine "docker-entrypoint.s…" 21 hours ago Up 21 hours 5432/tcp cabot_db_1 56ffbe60752c redis:alpine "docker-entrypoint.s…" 21 hours ago Up 21 hours 6379/tcp cabot_redis_1

liangerg commented 5 years ago

rundeck@rundeck:~$ date Thu Mar 14 11:48:00 CDT 2019 rundeck@rundeck:~$ rundeck@rundeck:~$ # list running docker containers rundeck@rundeck:~$ docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 1dc2472aee6e cabot:web "./docker-entrypoint…" About an hour ago Up About an hour 10.209.71.97:5050->5001/tcp cabot_web_1 cfaea01d75b0 cabot:web "./docker-entrypoint…" About an hour ago Up About an hour cabot_worker_1 a9e153e33776 postgres:alpine "docker-entrypoint.s…" About an hour ago Up About an hour 5432/tcp cabot_db_1 56ffbe60752c redis:alpine "docker-entrypoint.s…" About an hour ago Up About an hour 6379/tcp cabot_redis_1 rundeck@rundeck:~$ rundeck@rundeck:~$ # shell in to cabot_web_1 container rundeck@rundeck:~$ docker exec -i -t cabot_web_1 /bin/bash bash-4.3# bash-4.3# # attempt connection to postgres running in cabot_db_1 container bash-4.3# # psql is not installed but telnet is bash-4.3# telnet db 5432 Connection closed by foreign host bash-4.3# # seems that the connection was made and eventually closed by postgres once it did not hear again for a while bash-4.3# bash-4.3# # wait for external indication that postgres is refusing connections rundeck@rundeck:~$ # next day rundeck@rundeck:~$ docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 1dc2472aee6e cabot:web "./docker-entrypoint…" 21 hours ago Up 21 hours 10.209.71.97:5050->5001/tcp cabot_web_1 cfaea01d75b0 cabot:web "./docker-entrypoint…" 21 hours ago Up 21 hours cabot_worker_1 a9e153e33776 postgres:alpine "docker-entrypoint.s…" 21 hours ago Up 21 hours 5432/tcp cabot_db_1 56ffbe60752c redis:alpine "docker-entrypoint.s…" 21 hours ago Up 21 hours 6379/tcp cabot_redis_1 rundeck@rundeck:~$ rundeck@rundeck:~$ date Fri Mar 15 08:51:00 CDT 2019 rundeck@rundeck:~$ rundeck@rundeck:~$ # shell in to cabot_web_1 container rundeck@rundeck:~$ docker exec -i -t cabot_web_1 /bin/bash bash-4.3# bash-4.3# # attempt connection to postgres running in cabot_db_1 container bash-4.3# telnet db 5432 telnet: can't connect to remote host (172.18.0.3): Operation timed out bash-4.3# rundeck@rundeck:~$ # shell in to cabot_db_1 container rundeck@rundeck:~$ docker exec -i -t cabot_db_1 /bin/bash bash-4.4# bash-4.4# # telnet is not installed but psql is bash-4.4# psql -h db -p 5432 -U postgres psql (11.2) Type "help" for help. postgres=# # so postgres accepts connections on 5432 here but not from cabot_web_1 postgres-# # whereas ~21 hrs ago cabot_web_1 could connect

liangerg commented 5 years ago

Here is my docker-compose.yml rundeck@rundeck:/tmp/docker-cabot$ cat docker-compose.yml version: "2.1"

services: web: extends: file: docker-compose-base.yml service: base command: sh -c "cabot migrate && gunicorn cabot.wsgi:application -b 0.0.0.0:5000 --workers=5" ports:

volumes: data: Shall cab_beat run in docker?