CodeForPhilly / paws-data-pipeline

PAWS Data Pipeline Project
MIT License
17 stars 9 forks source link

Intermittent db error - "server closed the connection unexpectedly" #571

Open carrollsa opened 1 year ago

carrollsa commented 1 year ago

There is an intermittent error being thrown on the call to api/get_last_runs.

paws-compose-server  |   File "/app/./api/admin_api.py", line 241, in get_run_logs
paws-compose-server  |     result = connection.execute(q)
...
paws-compose-server  | sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) server closed the connection unexpected
paws-compose-server  |  This probably means the server terminated abnormally
paws-compose-server  |  before or while processing the request.
paws-compose-server  |
paws-compose-server  | [SQL: select keycol,valcol from kv_unique where keycol like '%%_update'; ]
paws-compose-server  | (Background on this error at: http://sqlalche.me/e/14/e3q8)

@c-simpson Does the above sound plausible? If so, I don't think this is much of an issue, but I wanted to track it.

c-simpson commented 1 year ago

We sleep the server at startup so the DB has time to be up before we touch it. Do you see this in your server logs? The "X already present" lines show successful selects against the DB.

------------STARTING Wed Sep  6 00:51:17 UTC 2023 ------------------
SLEEPING.. WAITING FOR DB
WAKING
INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO  [alembic.runtime.migration] Will assume transactional DDL.
INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO  [alembic.runtime.migration] Will assume transactional DDL.
d80cb6df0fa2 (head)
DB SETUP
         ...
{"event": "3 roles already present in DB, not creating", "level": "debug", "timestamp": 1693961485.3440847, "filename": "base_users.py", "func_name": "create_base_roles", "lineno": 46} 
{"event": "4 users already present in DB, not creating", "level": "debug", "timestamp": 1693961485.3868604, "filename": "base_users.py", "func_name": "create_base_users", "lineno": 91} 
{"event": "5 event types already present in DB, not creating", "level": "debug", "timestamp": 1693961485.3897622, "filename": "base_users.py", "func_name": "populate_sl_event_types", "lineno": 156}
carrollsa commented 1 year ago

Alright, it appears the issue is worth keeping around, at least, but it seems pretty low priority given that a refresh is enough to fix it and that it won't be used much.

carrollsa commented 12 months ago

We discovered a few things last night:

Currently testing a temporary solution of adding pool_pre_ping=True as an arg to sqlalchemy.create_engine().

Relevant logs:

....container startup logs

paws-compose-db      | 2023-09-12 23:50:17.730 UTC [32] LOG:  statement: BEGIN
paws-compose-db      | 2023-09-12 23:50:17.730 UTC [32] LOG:  statement: select valcol from kv_unique where keycol = 'last_execution_time';
paws-compose-db      | 2023-09-12 23:50:17.733 UTC [32] LOG:  statement: ROLLBACK
paws-compose-server  | [pid: 18|app: 0|req: 2/2] 172.30.0.4 () {54 vars in 1748 bytes} [Tue Sep 12 23:50:17 2023] GET /api/statistics => generated 2 bytes in 10 msecs (HTTP/1.0 200) 3 headers in 110 bytes (1 switches on core 1)
paws-compose-client  | 172.30.0.1 - - [12/Sep/2023:23:50:17 +0000] "GET /api/statistics HTTP/1.1" 200 2 "http://localhost/admin" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36" "-"
paws-compose-client  | 172.30.0.1 - - [12/Sep/2023:23:50:18 +0000] "GET /api/admin/user/get_users HTTP/1.1" 502 559 "http://localhost/users" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36" "-"
paws-compose-server  | Traceback (most recent call last):
paws-compose-client  | 2023/09/12 23:50:18 [error] 36#36: *1 upstream prematurely closed connection while reading response header from upstream, client: 172.30.0.1, server: localhost, request: "GET /api/admin/user/get_users HTTP/1.1", upstream: "http://172.30.0.3:5000/api/admin/user/get_users", host: "localhost", referrer: "http://localhost/users"
paws-compose-server  |   File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1770, in _execute_context
paws-compose-server  |     self.dialect.do_execute(
paws-compose-server  |   File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 717, in do_execute
paws-compose-server  |     cursor.execute(statement, parameters)
paws-compose-server  | psycopg2.OperationalError: server closed the connection unexpectedly
paws-compose-server  |  This probably means the server terminated abnormally
paws-compose-server  |  before or while processing the request.
paws-compose-server  |
paws-compose-server  |
paws-compose-server  | The above exception was the direct cause of the following exception:
paws-compose-server  |
paws-compose-server  | Traceback (most recent call last):
paws-compose-server  |   File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 2464, in __call__

...Flask call stack

/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 717, in do_execute
paws-compose-server  |     cursor.execute(statement, parameters)
paws-compose-server  | sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) server closed the connection unexpectedly
paws-compose-server  |  This probably means the server terminated abnormally
paws-compose-server  |  before or while processing the request.

# DB startup queries begin again here
paws-compose-server  | [SQL:  select username, full_name, active, pr.role
paws-compose-server  |             from pdp_users as pu
paws-compose-server  |             left join pdp_user_roles as pr on pu.role = pr._id
paws-compose-server  |             order by username ]
paws-compose-server  | (Background on this error at: http://sqlalche.me/e/14/e3q8)
paws-compose-server  | [pid: 16|app: 0|req: 1/3] 172.30.0.4 () {54 vars in 1768 bytes} [Tue Sep 12 23:50:18 2023] GET /api/admin/user/get_users => generated 0 bytes in 13 msecs (HTTP/1.0 500) 0 headers in 0 bytes (0 switches on core 0)
paws-compose-db      | 2023-09-12 23:54:07.864 UTC [32] LOG:  statement: BEGIN
paws-compose-db      | 2023-09-12 23:54:07.864 UTC [32] LOG:  statement: select active from pdp_users where username='base_admin'
paws-compose-db      | 2023-09-12 23:54:07.873 UTC [41] LOG:  statement: BEGIN
...

Not sure if the truncated version above is too helpful. All of the logs from the point of starting the container are here.

c-simpson commented 8 months ago

@carrollsa Close this one given #573?

carrollsa commented 8 months ago

@carrollsa Close this one given #573? Sounds good! Haven't encountered the error locally since upgrading.

carrollsa commented 2 months ago

Reopening this issue as the error is still present.

carrollsa commented 2 months ago

I found this stackoverflow post referencing our problem, but it is using some older versions of what we're using. I think we can do the same thing just by adding pool_pre_ping=True to our create_engine call in line 70 of config.py.

engine = db.create_engine(DB, pool_pre_ping=True)

I haven't been able to reproduce the error at all today without this addition, but it feels like it may be what we need.