att-comdev / deckhand

This project has moved to OpenStack.
https://www.airshipit.org/
Apache License 2.0
9 stars 5 forks source link

Database connection closed issues when using multiprocessing #20

Open fmontei opened 6 years ago

fmontei commented 6 years ago

There is a bug in Deckhand with connection being closed when uwsgi workers > 1:

[SQL: 'SELECT 1'] (Background on this error at: http://sqlalche.me/e/e3q8)
2018-03-30 18:19:12.897 21802 ERROR oslo_db.sqlalchemy.engines Traceback (most recent call last):
2018-03-30 18:19:12.897 21802 ERROR oslo_db.sqlalchemy.engines   File "/opt/stack/ucp/deckhand/.tox/functional/local/lib/python2.7/site-packages/oslo_db/sqlalchemy/engines.py", line 73, in _connect_ping_listener
2018-03-30 18:19:12.897 21802 ERROR oslo_db.sqlalchemy.engines     connection.scalar(select([1]))
2018-03-30 18:19:12.897 21802 ERROR oslo_db.sqlalchemy.engines   File "/opt/stack/ucp/deckhand/.tox/functional/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 880, in scalar
2018-03-30 18:19:12.897 21802 ERROR oslo_db.sqlalchemy.engines     return self.execute(object, *multiparams, **params).scalar()
2018-03-30 18:19:12.897 21802 ERROR oslo_db.sqlalchemy.engines   File "/opt/stack/ucp/deckhand/.tox/functional/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 948, in execute
2018-03-30 18:19:12.897 21802 ERROR oslo_db.sqlalchemy.engines     return meth(self, multiparams, params)
2018-03-30 18:19:12.897 21802 ERROR oslo_db.sqlalchemy.engines   File "/opt/stack/ucp/deckhand/.tox/functional/local/lib/python2.7/site-packages/sqlalchemy/sql/elements.py", line 269, in _execute_on_connection
2018-03-30 18:19:12.897 21802 ERROR oslo_db.sqlalchemy.engines     return connection._execute_clauseelement(self, multiparams, params)
2018-03-30 18:19:12.897 21802 ERROR oslo_db.sqlalchemy.engines   File "/opt/stack/ucp/deckhand/.tox/functional/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1060, in _execute_clauseelement
2018-03-30 18:19:12.897 21802 ERROR oslo_db.sqlalchemy.engines     compiled_sql, distilled_params
2018-03-30 18:19:12.897 21802 ERROR oslo_db.sqlalchemy.engines   File "/opt/stack/ucp/deckhand/.tox/functional/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1200, in _execute_context
2018-03-30 18:19:12.897 21802 ERROR oslo_db.sqlalchemy.engines     context)
2018-03-30 18:19:12.897 21802 ERROR oslo_db.sqlalchemy.engines   File "/opt/stack/ucp/deckhand/.tox/functional/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1409, in _handle_dbapi_exception
2018-03-30 18:19:12.897 21802 ERROR oslo_db.sqlalchemy.engines     util.raise_from_cause(newraise, exc_info)
2018-03-30 18:19:12.897 21802 ERROR oslo_db.sqlalchemy.engines   File "/opt/stack/ucp/deckhand/.tox/functional/local/lib/python2.7/site-packages/sqlalchemy/util/compat.py", line 203, in raise_from_cause
2018-03-30 18:19:12.897 21802 ERROR oslo_db.sqlalchemy.engines     reraise(type(exception), exception, tb=exc_tb, cause=cause)
2018-03-30 18:19:12.897 21802 ERROR oslo_db.sqlalchemy.engines   File "/opt/stack/ucp/deckhand/.tox/functional/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1193, in _execute_context
2018-03-30 18:19:12.897 21802 ERROR oslo_db.sqlalchemy.engines     context)
2018-03-30 18:19:12.897 21802 ERROR oslo_db.sqlalchemy.engines   File "/opt/stack/ucp/deckhand/.tox/functional/local/lib/python2.7/site-packages/sqlalchemy/engine/default.py", line 507, in do_execute
2018-03-30 18:19:12.897 21802 ERROR oslo_db.sqlalchemy.engines     cursor.execute(statement, parameters)
2018-03-30 18:19:12.897 21802 ERROR oslo_db.sqlalchemy.engines DBConnectionError: (psycopg2.OperationalError) server closed the connection unexpectedly
2018-03-30 18:19:12.897 21802 ERROR oslo_db.sqlalchemy.engines  This probably means the server terminated abnormally
2018-03-30 18:19:12.897 21802 ERROR oslo_db.sqlalchemy.engines  before or while processing the request.
2018-03-30 18:19:12.897 21802 ERROR oslo_db.sqlalchemy.engines  [SQL: 'SELECT 1'] (Background on this error at: http://sqlalche.me/e/e3q8)

The DB error from connection being reset has to do to with the fact that the DB connection pool is being shared across multiple processes (workers) when uwsgi workers > 1. There are ways to fix that but still don't know the fix without reading a bunch of long sqlalchemy documentation. And BTW, this issue would manifest itself with or without sqlalchemy. One way is to use a NullPool which doesn't pool any connection strings but I'm guessing has a slight impact on DB performance and the other is to use sqlalchemy event listeners and process PIDs. According to the docs anyway. (edited) The native OpenStack pattern is to only provide thread locking support, not multiprocess support.

More info: http://docs.sqlalchemy.org/en/rel_1_0/faq/connections.html#how-do-i-use-engines-connections-sessions-with-python-multiprocessing-or-os-fork http://docs.sqlalchemy.org/en/rel_1_0/core/pooling.html#using-connection-pools-with-multiprocessing