geosolutions-it / C195-azure-workspace

1 stars 2 forks source link

ckan and Postgresql connection issues - enabling sqlalchemy connection pooling #3

Closed lpasquali closed 3 years ago

lpasquali commented 3 years ago

Even if we migrated to postgres managed by azure, the problem in connections between postgres persists. As discurred ans seen with @etj we can inject pure sqlalchemy config options in production.ini as documented here then rebuild the docker image and push it to the crea registry in azure.

This is a race condition, not easily reproducible: after some time it has been used and running ckan stops connecting to postgres:

ckan          | 2021-02-25 10:50:17,320 DEBUG [repoze.who] identifier plugins registered: [<FriendlyFormPlugin 140272518243552>, <CkanAuthTktCookiePlugin 140272518242488>]
ckan          | 2021-02-25 10:50:17,321 DEBUG [repoze.who] identifier plugins matched for classification "browser": [<FriendlyFormPlugin 140272518243552>, <CkanAuthTktCookiePlugin 140272518242488>]
ckan          | 2021-02-25 10:50:17,321 DEBUG [repoze.who] no identity returned from <FriendlyFormPlugin 140272518243552> (None)
ckan          | 2021-02-25 10:50:17,321 DEBUG [repoze.who] identity returned from <CkanAuthTktCookiePlugin 140272518242488>: {'timestamp': 1613643505, 'repoze.who.plugins.auth_tkt.userid': 'admin', 'tokens': [''], 'userdata': {}}
ckan          | 2021-02-25 10:50:17,322 DEBUG [repoze.who] identities found: [(<CkanAuthTktCookiePlugin 140272518242488>, {'timestamp': 1613643505, 'repoze.who.plugins.auth_tkt.userid': 'admin', 'tokens': [''], 'userdata': {}})]
ckan          | 2021-02-25 10:50:17,322 DEBUG [repoze.who] authenticator plugins registered: [<CkanAuthTktCookiePlugin 140272518242488>, <ckan.lib.authenticator.UsernamePasswordAuthenticator object at 0x7f93bd98b3c8>]
ckan          | 2021-02-25 10:50:17,322 DEBUG [repoze.who] authenticator plugins matched for classification "browser": [<CkanAuthTktCookiePlugin 140272518242488>, <ckan.lib.authenticator.UsernamePasswordAuthenticator object at 0x7f93bd98b3c8>]
ckan          | 2021-02-25 10:50:17,322 DEBUG [repoze.who] userid returned from <CkanAuthTktCookiePlugin 140272518242488>: "admin"
ckan          | 2021-02-25 10:50:17,322 DEBUG [repoze.who] no userid returned from <ckan.lib.authenticator.UsernamePasswordAuthenticator object at 0x7f93bd98b3c8>: (None)
ckan          | 2021-02-25 10:50:17,323 DEBUG [repoze.who] identities authenticated: [((0, 0), <CkanAuthTktCookiePlugin 140272518242488>, <CkanAuthTktCookiePlugin 140272518242488>, {'timestamp': 1613643505, 'repoze.who.plugins.auth_tkt.userid': 'admin', 'tokens': [''], 'userdata': {}, 'repoze.who.userid': 'admin'}, 'admin')]
ckan          | Traceback (most recent call last):
ckan          |   File "/usr/lib/ckan/venv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1177, in _execute_context
ckan          |     conn = self._revalidate_connection()
ckan          |   File "/usr/lib/ckan/venv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 463, in _revalidate_connection
ckan          |     "Can't reconnect until invalid "
ckan          | sqlalchemy.exc.InvalidRequestError: Can't reconnect until invalid transaction is rolled back
ckan          | 
ckan          | The above exception was the direct cause of the following exception:
ckan          | 
ckan          | Traceback (most recent call last):
ckan          |   File "/usr/lib/ckan/venv/lib/python3.7/site-packages/flask/app.py", line 2449, in wsgi_app
ckan          |     response = self.handle_exception(e)
ckan          |   File "/usr/lib/ckan/venv/lib/python3.7/site-packages/flask/app.py", line 1866, in handle_exception
ckan          |     reraise(exc_type, exc_value, tb)
ckan          |   File "/usr/lib/ckan/venv/lib/python3.7/site-packages/flask/_compat.py", line 39, in reraise
ckan          |     raise value
ckan          |   File "/usr/lib/ckan/venv/lib/python3.7/site-packages/flask/app.py", line 2446, in wsgi_app
ckan          |     response = self.full_dispatch_request()
ckan          |   File "/usr/lib/ckan/venv/lib/python3.7/site-packages/flask/app.py", line 1951, in full_dispatch_request
ckan          |     rv = self.handle_user_exception(e)
ckan          |   File "/usr/lib/ckan/venv/lib/python3.7/site-packages/flask/app.py", line 1820, in handle_user_exception
ckan          |     reraise(exc_type, exc_value, tb)
ckan          |   File "/usr/lib/ckan/venv/lib/python3.7/site-packages/flask/_compat.py", line 39, in reraise
ckan          |     raise value
ckan          |   File "/usr/lib/ckan/venv/lib/python3.7/site-packages/flask/app.py", line 1947, in full_dispatch_request
ckan          |     rv = self.preprocess_request()
ckan          |   File "/usr/lib/ckan/venv/lib/python3.7/site-packages/flask/app.py", line 2241, in preprocess_request
ckan          |     rv = func()
ckan          |   File "/usr/lib/ckan/venv/src/ckan/ckan/config/middleware/flask_app.py", line 369, in ckan_before_request
ckan          |     app_globals.app_globals._check_uptodate()
ckan          |   File "/usr/lib/ckan/venv/src/ckan/ckan/lib/app_globals.py", line 202, in _check_uptodate
ckan          |     value = model.get_system_info('ckan.config_update')
ckan          |   File "/usr/lib/ckan/venv/src/ckan/ckan/model/system_info.py", line 47, in get_system_info
ckan          |     obj = meta.Session.query(SystemInfo).filter_by(key=key).first()
ckan          |   File "/usr/lib/ckan/venv/lib/python3.7/site-packages/sqlalchemy/orm/query.py", line 3222, in first
ckan          |     ret = list(self[0:1])
ckan          |   File "/usr/lib/ckan/venv/lib/python3.7/site-packages/sqlalchemy/orm/query.py", line 3012, in __getitem__
ckan          |     return list(res)
ckan          |   File "/usr/lib/ckan/venv/lib/python3.7/site-packages/sqlalchemy/orm/query.py", line 3324, in __iter__
ckan          |     return self._execute_and_instances(context)
ckan          |   File "/usr/lib/ckan/venv/lib/python3.7/site-packages/sqlalchemy/orm/query.py", line 3349, in _execute_and_instances
ckan          |     result = conn.execute(querycontext.statement, self._params)
ckan          |   File "/usr/lib/ckan/venv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 988, in execute
ckan          |     return meth(self, multiparams, params)
ckan          |   File "/usr/lib/ckan/venv/lib/python3.7/site-packages/sqlalchemy/sql/elements.py", line 287, in _execute_on_connection
ckan          |     return connection._execute_clauseelement(self, multiparams, params)
ckan          |   File "/usr/lib/ckan/venv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1107, in _execute_clauseelement
ckan          |     distilled_params,
ckan          |   File "/usr/lib/ckan/venv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1182, in _execute_context
ckan          |     e, util.text_type(statement), parameters, None, None
ckan          |   File "/usr/lib/ckan/venv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1466, in _handle_dbapi_exception
ckan          |     util.raise_from_cause(sqlalchemy_exception, exc_info)
ckan          |   File "/usr/lib/ckan/venv/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 399, in raise_from_cause
ckan          |     reraise(type(exception), exception, tb=exc_tb, cause=cause)
ckan          |   File "/usr/lib/ckan/venv/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 153, in reraise
ckan          |     raise value.with_traceback(tb)
ckan          |   File "/usr/lib/ckan/venv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1177, in _execute_context
ckan          |     conn = self._revalidate_connection()
ckan          |   File "/usr/lib/ckan/venv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 463, in _revalidate_connection
ckan          |     "Can't reconnect until invalid "
ckan          | sqlalchemy.exc.StatementError: (sqlalchemy.exc.InvalidRequestError) Can't reconnect until invalid transaction is rolled back
ckan          | [SQL: SELECT system_info.id AS system_info_id, system_info.key AS system_info_key, system_info.value AS system_info_value, system_info.state AS system_info_state 
ckan          | FROM system_info 
ckan          | WHERE system_info.key = %(key_1)s 
ckan          |  LIMIT %(param_1)s]
ckan          | [parameters: [{}]]

Useful links

https://docs.sqlalchemy.org/en/13/core/engines.html#sqlalchemy.create_engine.params.pool_reset_on_return https://docs.sqlalchemy.org/en/13/core/engines.html#sqlalchemy.create_engine.params.echo_pool

lpasquali commented 3 years ago

I activated ckan database pooling @etj, let's hope it will do the necessary work:

 27358 | ckan      | 336396 |    27356 | ckan            |                          | 10.0.1.4      |                 |        7492 | 2021-03-02 15:24:21.966344+00 |                               | 2021-03-02 15:24:22.310077+00 | 2021-03-02 15:24:22.310077+00 | Client          | ClientRead       | idle   |             |              | COMMIT                   | client backend
 27358 | ckan      | 336456 |    27356 | ckan            |                          | 10.0.1.4      |                 |       17537 | 2021-03-02 15:24:22.122605+00 |                               | 2021-03-02 15:24:22.294412+00 | 2021-03-02 15:24:22.294412+00 | Client          | ClientRead       | idle   |             |              | ROLLBACK                 | client backend
 27358 | ckan      | 336468 |    27356 | ckan            |                          | 10.0.1.4      |                 |       23108 | 2021-03-02 15:24:24.310132+00 |                               | 2021-03-02 15:24:24.638258+00 | 2021-03-02 15:24:24.638258+00 | Client          | ClientRead       | idle   |             |              | COMMIT                   | client backend
 27358 | ckan      | 336476 |    27356 | ckan            |                          | 10.0.1.4      |                 |        3394 | 2021-03-02 15:24:24.466362+00 |                               | 2021-03-02 15:24:24.638258+00 | 2021-03-02 15:24:24.638258+00 | Client          | ClientRead       | idle   |             |              | ROLLBACK                 | client backend
 27358 | ckan      | 336504 |    27356 | ckan            | psql                     | 79.135.50.243 |                 |       23105 | 2021-03-02 15:31:15.676229+00 | 2021-03-02 15:31:55.973791+00 | 2021-03-02 15:31:55.973791+00 | 2021-03-02 15:31:55.973791+00 |                 |                  | active |             |         4669 | select *                +| client backend

I started putting pool size of 5, and it looks like it was felt from ckan production.ini as expected as there are 5 connections as above.

lpasquali commented 3 years ago

there are still issues in sql connections possibly because pure sql alchemy configuration options aren't passed from production.ini

lpasquali commented 3 years ago

@etj great hint! now we've got pooling work

2021-03-04 13:30:04,204 INFO sqlalchemy.pool.impl.QueuePool Pool disposed. Pool size: 10  Connections in pool: 0 Current Overflow: -10 Current Checked out connections: 0
2021-03-04 13:30:04,204 INFO  [sqlalchemy.pool.impl.QueuePool] Pool disposed. Pool size: 10  Connections in pool: 0 Current Overflow: -10 Current Checked out connections: 0
2021-03-04 13:30:04,205 INFO sqlalchemy.pool.impl.QueuePool Pool recreating
2021-03-04 13:30:04,205 INFO  [sqlalchemy.pool.impl.QueuePool] Pool recreating
2021-03-04 13:30:04,206 DEBUG [ckan.plugins.core] Loading the synchronous search plugin
2021-03-04 13:30:04,213 DEBUG [ckan.lib.webassets_tools] Base path /usr/lib/ckan/venv/src/ckan/ckan/public/base
2021-03-04 13:30:04,377 INFO  [ckan.config.environment] Loading templates from /usr/lib/ckan/venv/src/ckan/ckan/templates
2021-03-04 13:30:04,379 DEBUG [ckan.logic] check access OK - get_site_user user=None
2021-03-04 13:30:04,609 INFO sqlalchemy.pool.impl.QueuePool Pool disposed. Pool size: 10  Connections in pool: 0 Current Overflow: -10 Current Checked out connections: 0
2021-03-04 13:30:04,609 INFO  [sqlalchemy.pool.impl.QueuePool] Pool disposed. Pool size: 10  Connections in pool: 0 Current Overflow: -10 Current Checked out connections: 0
2021-03-04 13:30:04,609 INFO sqlalchemy.pool.impl.QueuePool Pool recreating
2021-03-04 13:30:04,609 INFO  [sqlalchemy.pool.impl.QueuePool] Pool recreating
randomorder commented 3 years ago

can we close this one @lpasquali ?

lpasquali commented 3 years ago

Hello @etj after the weekend the ckan application was stuck again with no relevant log (it just stopped around 20:40 on last Friday) by chance I came on this old post in the ckan ml: https://lists-archive.okfn.org/pipermail/ckan-dev/2017-May/022314.html it is advised to enable keepalive on postgres, which wasn't at least now on the postgresql azure db instance, I tried to enable it and tune to 30 seconds: image

let's hope this fixes, finger crossed..

lpasquali commented 3 years ago

@etj tcp keepalive on postgres side did not solve the problem but it probably made sqlalchemy to "print it"

sqlalchemy.exc.StatementError
sqlalchemy.exc.StatementError: (sqlalchemy.exc.InvalidRequestError) Can't reconnect until invalid transaction is rolled back
[SQL: SELECT system_info.id AS system_info_id, system_info.key AS system_info_key, system_info.value AS system_info_value, system_info.state AS system_info_state 
FROM system_info 
WHERE system_info.key = %(key_1)s 
 LIMIT %(param_1)s]
[parameters: [{}]]

Screenshot from 2021-03-09 15-21-51

etj commented 3 years ago

Working on a fix CKAN side. Opened this issue: https://github.com/ckan/ckan/issues/5953

etj commented 3 years ago

Tentative fix on branch c195 in repo https://github.com/geosolutions-it/ckan

lpasquali commented 3 years ago

Tentative fix on branch c195 in repo https://github.com/geosolutions-it/ckan

@etj new docker image from such repository is up

lpasquali commented 3 years ago

@etj can we close this?