Open carrollsa opened 1 year ago
We sleep the server at startup so the DB has time to be up before we touch it. Do you see this in your server logs? The "X already present" lines show successful selects against the DB.
------------STARTING Wed Sep 6 00:51:17 UTC 2023 ------------------
SLEEPING.. WAITING FOR DB
WAKING
INFO [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO [alembic.runtime.migration] Will assume transactional DDL.
INFO [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO [alembic.runtime.migration] Will assume transactional DDL.
d80cb6df0fa2 (head)
DB SETUP
...
{"event": "3 roles already present in DB, not creating", "level": "debug", "timestamp": 1693961485.3440847, "filename": "base_users.py", "func_name": "create_base_roles", "lineno": 46}
{"event": "4 users already present in DB, not creating", "level": "debug", "timestamp": 1693961485.3868604, "filename": "base_users.py", "func_name": "create_base_users", "lineno": 91}
{"event": "5 event types already present in DB, not creating", "level": "debug", "timestamp": 1693961485.3897622, "filename": "base_users.py", "func_name": "populate_sl_event_types", "lineno": 156}
Alright, it appears the issue is worth keeping around, at least, but it seems pretty low priority given that a refresh is enough to fix it and that it won't be used much.
We discovered a few things last night:
get_last_runs
endpoint.Currently testing a temporary solution of adding pool_pre_ping=True
as an arg to sqlalchemy.create_engine().
Relevant logs:
....container startup logs
paws-compose-db | 2023-09-12 23:50:17.730 UTC [32] LOG: statement: BEGIN
paws-compose-db | 2023-09-12 23:50:17.730 UTC [32] LOG: statement: select valcol from kv_unique where keycol = 'last_execution_time';
paws-compose-db | 2023-09-12 23:50:17.733 UTC [32] LOG: statement: ROLLBACK
paws-compose-server | [pid: 18|app: 0|req: 2/2] 172.30.0.4 () {54 vars in 1748 bytes} [Tue Sep 12 23:50:17 2023] GET /api/statistics => generated 2 bytes in 10 msecs (HTTP/1.0 200) 3 headers in 110 bytes (1 switches on core 1)
paws-compose-client | 172.30.0.1 - - [12/Sep/2023:23:50:17 +0000] "GET /api/statistics HTTP/1.1" 200 2 "http://localhost/admin" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36" "-"
paws-compose-client | 172.30.0.1 - - [12/Sep/2023:23:50:18 +0000] "GET /api/admin/user/get_users HTTP/1.1" 502 559 "http://localhost/users" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36" "-"
paws-compose-server | Traceback (most recent call last):
paws-compose-client | 2023/09/12 23:50:18 [error] 36#36: *1 upstream prematurely closed connection while reading response header from upstream, client: 172.30.0.1, server: localhost, request: "GET /api/admin/user/get_users HTTP/1.1", upstream: "http://172.30.0.3:5000/api/admin/user/get_users", host: "localhost", referrer: "http://localhost/users"
paws-compose-server | File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1770, in _execute_context
paws-compose-server | self.dialect.do_execute(
paws-compose-server | File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 717, in do_execute
paws-compose-server | cursor.execute(statement, parameters)
paws-compose-server | psycopg2.OperationalError: server closed the connection unexpectedly
paws-compose-server | This probably means the server terminated abnormally
paws-compose-server | before or while processing the request.
paws-compose-server |
paws-compose-server |
paws-compose-server | The above exception was the direct cause of the following exception:
paws-compose-server |
paws-compose-server | Traceback (most recent call last):
paws-compose-server | File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 2464, in __call__
...Flask call stack
/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 717, in do_execute
paws-compose-server | cursor.execute(statement, parameters)
paws-compose-server | sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) server closed the connection unexpectedly
paws-compose-server | This probably means the server terminated abnormally
paws-compose-server | before or while processing the request.
# DB startup queries begin again here
paws-compose-server | [SQL: select username, full_name, active, pr.role
paws-compose-server | from pdp_users as pu
paws-compose-server | left join pdp_user_roles as pr on pu.role = pr._id
paws-compose-server | order by username ]
paws-compose-server | (Background on this error at: http://sqlalche.me/e/14/e3q8)
paws-compose-server | [pid: 16|app: 0|req: 1/3] 172.30.0.4 () {54 vars in 1768 bytes} [Tue Sep 12 23:50:18 2023] GET /api/admin/user/get_users => generated 0 bytes in 13 msecs (HTTP/1.0 500) 0 headers in 0 bytes (0 switches on core 0)
paws-compose-db | 2023-09-12 23:54:07.864 UTC [32] LOG: statement: BEGIN
paws-compose-db | 2023-09-12 23:54:07.864 UTC [32] LOG: statement: select active from pdp_users where username='base_admin'
paws-compose-db | 2023-09-12 23:54:07.873 UTC [41] LOG: statement: BEGIN
...
Not sure if the truncated version above is too helpful. All of the logs from the point of starting the container are here.
@carrollsa Close this one given #573?
@carrollsa Close this one given #573? Sounds good! Haven't encountered the error locally since upgrading.
Reopening this issue as the error is still present.
I found this stackoverflow post referencing our problem, but it is using some older versions of what we're using. I think we can do the same thing just by adding pool_pre_ping=True
to our create_engine
call in line 70 of config.py
.
engine = db.create_engine(DB, pool_pre_ping=True)
I haven't been able to reproduce the error at all today without this addition, but it feels like it may be what we need.
There is an intermittent error being thrown on the call to
api/get_last_runs
.@c-simpson Does the above sound plausible? If so, I don't think this is much of an issue, but I wanted to track it.