Closed lpasquali closed 3 years ago
I activated ckan database pooling @etj, let's hope it will do the necessary work:
27358 | ckan | 336396 | 27356 | ckan | | 10.0.1.4 | | 7492 | 2021-03-02 15:24:21.966344+00 | | 2021-03-02 15:24:22.310077+00 | 2021-03-02 15:24:22.310077+00 | Client | ClientRead | idle | | | COMMIT | client backend
27358 | ckan | 336456 | 27356 | ckan | | 10.0.1.4 | | 17537 | 2021-03-02 15:24:22.122605+00 | | 2021-03-02 15:24:22.294412+00 | 2021-03-02 15:24:22.294412+00 | Client | ClientRead | idle | | | ROLLBACK | client backend
27358 | ckan | 336468 | 27356 | ckan | | 10.0.1.4 | | 23108 | 2021-03-02 15:24:24.310132+00 | | 2021-03-02 15:24:24.638258+00 | 2021-03-02 15:24:24.638258+00 | Client | ClientRead | idle | | | COMMIT | client backend
27358 | ckan | 336476 | 27356 | ckan | | 10.0.1.4 | | 3394 | 2021-03-02 15:24:24.466362+00 | | 2021-03-02 15:24:24.638258+00 | 2021-03-02 15:24:24.638258+00 | Client | ClientRead | idle | | | ROLLBACK | client backend
27358 | ckan | 336504 | 27356 | ckan | psql | 79.135.50.243 | | 23105 | 2021-03-02 15:31:15.676229+00 | 2021-03-02 15:31:55.973791+00 | 2021-03-02 15:31:55.973791+00 | 2021-03-02 15:31:55.973791+00 | | | active | | 4669 | select * +| client backend
I started putting pool size of 5, and it looks like it was felt from ckan production.ini as expected as there are 5 connections as above.
there are still issues in sql connections possibly because pure sql alchemy configuration options aren't passed from production.ini
@etj great hint! now we've got pooling work
2021-03-04 13:30:04,204 INFO sqlalchemy.pool.impl.QueuePool Pool disposed. Pool size: 10 Connections in pool: 0 Current Overflow: -10 Current Checked out connections: 0
2021-03-04 13:30:04,204 INFO [sqlalchemy.pool.impl.QueuePool] Pool disposed. Pool size: 10 Connections in pool: 0 Current Overflow: -10 Current Checked out connections: 0
2021-03-04 13:30:04,205 INFO sqlalchemy.pool.impl.QueuePool Pool recreating
2021-03-04 13:30:04,205 INFO [sqlalchemy.pool.impl.QueuePool] Pool recreating
2021-03-04 13:30:04,206 DEBUG [ckan.plugins.core] Loading the synchronous search plugin
2021-03-04 13:30:04,213 DEBUG [ckan.lib.webassets_tools] Base path /usr/lib/ckan/venv/src/ckan/ckan/public/base
2021-03-04 13:30:04,377 INFO [ckan.config.environment] Loading templates from /usr/lib/ckan/venv/src/ckan/ckan/templates
2021-03-04 13:30:04,379 DEBUG [ckan.logic] check access OK - get_site_user user=None
2021-03-04 13:30:04,609 INFO sqlalchemy.pool.impl.QueuePool Pool disposed. Pool size: 10 Connections in pool: 0 Current Overflow: -10 Current Checked out connections: 0
2021-03-04 13:30:04,609 INFO [sqlalchemy.pool.impl.QueuePool] Pool disposed. Pool size: 10 Connections in pool: 0 Current Overflow: -10 Current Checked out connections: 0
2021-03-04 13:30:04,609 INFO sqlalchemy.pool.impl.QueuePool Pool recreating
2021-03-04 13:30:04,609 INFO [sqlalchemy.pool.impl.QueuePool] Pool recreating
can we close this one @lpasquali ?
Hello @etj after the weekend the ckan application was stuck again with no relevant log (it just stopped around 20:40 on last Friday) by chance I came on this old post in the ckan ml: https://lists-archive.okfn.org/pipermail/ckan-dev/2017-May/022314.html it is advised to enable keepalive on postgres, which wasn't at least now on the postgresql azure db instance, I tried to enable it and tune to 30 seconds:
let's hope this fixes, finger crossed..
@etj tcp keepalive on postgres side did not solve the problem but it probably made sqlalchemy to "print it"
sqlalchemy.exc.StatementError
sqlalchemy.exc.StatementError: (sqlalchemy.exc.InvalidRequestError) Can't reconnect until invalid transaction is rolled back
[SQL: SELECT system_info.id AS system_info_id, system_info.key AS system_info_key, system_info.value AS system_info_value, system_info.state AS system_info_state
FROM system_info
WHERE system_info.key = %(key_1)s
LIMIT %(param_1)s]
[parameters: [{}]]
Working on a fix CKAN side. Opened this issue: https://github.com/ckan/ckan/issues/5953
Tentative fix on branch c195
in repo https://github.com/geosolutions-it/ckan
Tentative fix on branch
c195
in repo https://github.com/geosolutions-it/ckan
@etj new docker image from such repository is up
@etj can we close this?
Even if we migrated to postgres managed by azure, the problem in connections between postgres persists. As discurred ans seen with @etj we can inject pure sqlalchemy config options in production.ini as documented here then rebuild the docker image and push it to the crea registry in azure.
This is a race condition, not easily reproducible: after some time it has been used and running ckan stops connecting to postgres:
Useful links
https://docs.sqlalchemy.org/en/13/core/engines.html#sqlalchemy.create_engine.params.pool_reset_on_return https://docs.sqlalchemy.org/en/13/core/engines.html#sqlalchemy.create_engine.params.echo_pool