digglife / ibm-db

Automatically exported from code.google.com/p/ibm-db
0 stars 0 forks source link

ibm_db driver should return some relevant error message in case of closed connection. #167

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
During some testing, I found that (in this case) Neutron could end up with a 
"poison" database connection in its pool and not realize that this connection 
was bad. It would constantly try using that connection for all requests, which 
would fail, thus rendering that Neutron "bad." Still, HAProxy would send 
requests to that Neutron.
Something probably needs to be done to fix (upstream?) how Neutron (and likely 
other similar OpenStack SQLAlchemy code) deals with this type of DB2 error. 
This was caused by (on the DB2 server side) issuing "db2 force application 
(12345)".
AND/OR something needs to be done to detect when a given Neutron server (and 
likely other similar OpenStack components) is CONTINUALLY failing to handle 
every single request it picks up. I believe "STONITH" is the correct response 
here. Something should kill and restart the service.

Details:

neutron sqlalchemy pool problem when FORCE APPLICATION from db2-side.
It is trying to ping the DB using:
ping_sql = 'select 1 from (values (1)) AS t1'
and getting back an exception which is not removing the connection from the 
pool.

2014-10-28 15:57:02.062 3339 TRACE neutron.api.v2.resource   File 
"/usr/lib/python2.6/site-packages/neutron/plugins/ibm/sdnve_neutron_plugin.py", 
line 111, in hawrapper
2014-10-28 15:57:02.062 3339 TRACE neutron.api.v2.resource     ret_func = 
func(self, *args, **kwargs)
2014-10-28 15:57:02.062 3339 TRACE neutron.api.v2.resource   File 
"/usr/lib/python2.6/site-packages/neutron/plugins/ibm/sdnve_neutron_plugin.py", 
line 279, in get_networks
2014-10-28 15:57:02.062 3339 TRACE neutron.api.v2.resource     context, 
filters, fields, sorts, limit, marker, page_reverse)
2014-10-28 15:57:02.062 3339 TRACE neutron.api.v2.resource   File 
"/usr/lib/python2.6/site-packages/neutron/db/db_base_plugin_v2.py", line 1044, 
in get_networks
2014-10-28 15:57:02.062 3339 TRACE neutron.api.v2.resource     
page_reverse=page_reverse)
2014-10-28 15:57:02.062 3339 TRACE neutron.api.v2.resource   File 
"/usr/lib/python2.6/site-packages/neutron/db/db_base_plugin_v2.py", line 208, 
in _get_collection
2014-10-28 15:57:02.062 3339 TRACE neutron.api.v2.resource     items = 
[dict_func(c, fields) for c in query]
2014-10-28 15:57:02.062 3339 TRACE neutron.api.v2.resource   File 
"/usr/lib64/python2.6/site-packages/sqlalchemy/orm/query.py", line 2227, in 
__iter__
2014-10-28 15:57:02.062 3339 TRACE neutron.api.v2.resource     return 
self._execute_and_instances(context)
2014-10-28 15:57:02.062 3339 TRACE neutron.api.v2.resource   File 
"/usr/lib64/python2.6/site-packages/sqlalchemy/orm/query.py", line 2240, in 
_execute_and_instances
2014-10-28 15:57:02.062 3339 TRACE neutron.api.v2.resource     
close_with_result=True)
2014-10-28 15:57:02.062 3339 TRACE neutron.api.v2.resource   File 
"/usr/lib64/python2.6/site-packages/sqlalchemy/orm/query.py", line 2231, in 
_connection_from_session
2014-10-28 15:57:02.062 3339 TRACE neutron.api.v2.resource     **kw)
2014-10-28 15:57:02.062 3339 TRACE neutron.api.v2.resource   File 
"/usr/lib64/python2.6/site-packages/sqlalchemy/orm/session.py", line 777, in 
connection
2014-10-28 15:57:02.062 3339 TRACE neutron.api.v2.resource     
close_with_result=close_with_result)
2014-10-28 15:57:02.062 3339 TRACE neutron.api.v2.resource   File 
"/usr/lib64/python2.6/site-packages/sqlalchemy/orm/session.py", line 783, in 
_connection_for_bind
2014-10-28 15:57:02.062 3339 TRACE neutron.api.v2.resource     return 
engine.contextual_connect(**kwargs)
2014-10-28 15:57:02.062 3339 TRACE neutron.api.v2.resource   File 
"/usr/lib64/python2.6/site-packages/sqlalchemy/engine/base.py", line 2489, in 
contextual_connect
2014-10-28 15:57:02.062 3339 TRACE neutron.api.v2.resource     
self.pool.connect(),
2014-10-28 15:57:02.062 3339 TRACE neutron.api.v2.resource   File 
"/usr/lib64/python2.6/site-packages/sqlalchemy/pool.py", line 236, in connect
2014-10-28 15:57:02.062 3339 TRACE neutron.api.v2.resource     return 
_ConnectionFairy(self).checkout()
2014-10-28 15:57:02.062 3339 TRACE neutron.api.v2.resource   File 
"/usr/lib64/python2.6/site-packages/sqlalchemy/pool.py", line 474, in checkout
2014-10-28 15:57:02.062 3339 TRACE neutron.api.v2.resource     self)
2014-10-28 15:57:02.062 3339 TRACE neutron.api.v2.resource   File 
"/usr/lib64/python2.6/site-packages/sqlalchemy/event.py", line 377, in __call__
2014-10-28 15:57:02.062 3339 TRACE neutron.api.v2.resource     fn(*args, **kw)
2014-10-28 15:57:02.062 3339 TRACE neutron.api.v2.resource   File 
"/usr/lib/python2.6/site-packages/neutron/openstack/common/db/sqlalchemy/session
.py", line 684, in _ping_listener
2014-10-28 15:57:02.062 3339 TRACE neutron.api.v2.resource     
cursor.execute(ping_sql)
2014-10-28 15:57:02.062 3339 TRACE neutron.api.v2.resource   File 
"/usr/lib64/python2.6/site-packages/ibm_db_dbi.py", line 1333, in execute
2014-10-28 15:57:02.062 3339 TRACE neutron.api.v2.resource     
self._prepare_helper(operation)
2014-10-28 15:57:02.062 3339 TRACE neutron.api.v2.resource   File 
"/usr/lib64/python2.6/site-packages/ibm_db_dbi.py", line 1204, in 
_prepare_helper
2014-10-28 15:57:02.062 3339 TRACE neutron.api.v2.resource     raise 
self.messages[len(self.messages) - 1]
2014-10-28 15:57:02.062 3339 TRACE neutron.api.v2.resource Error: 
ibm_db_dbi::Error: SystemError('error return without exception set',)
2014-10-28 15:57:02.062 3339 TRACE neutron.api.v2.resource

I think I'd suggest looking around line 680 of 
neutron/openstack/common/db/sqlalchemy/session.py
Here is the code, some of which is already DB2-specific:

def _ping_listener(engine, dbapi_conn, connection_rec, connection_proxy):
    """Ensures that MySQL and DB2 connections are alive.

    Borrowed from:
    http://groups.google.com/group/sqlalchemy/msg/a4ce563d802c929f
    """
    cursor = dbapi_conn.cursor()
    try:
        ping_sql = 'select 1'
        if engine.name == 'ibm_db_sa':
            # DB2 requires a table expression
            ping_sql = 'select 1 from (values (1)) AS t1'
        cursor.execute(ping_sql)
    except Exception as ex:
        if engine.dialect.is_disconnect(ex, dbapi_conn, cursor):
            msg = _('Database server has gone away: %s') % ex
            LOG.warning(msg)
            raise sqla_exc.DisconnectionError(msg)
        else:
            raise

Original issue reported on code.google.com by gekun0...@gmail.com on 14 Jan 2015 at 3:07

GoogleCodeExporter commented 8 years ago
I am looking into this issue, will try to solve it as soon as possible.

Original comment by rahul.pr...@in.ibm.com on 16 Jan 2015 at 6:20

GoogleCodeExporter commented 8 years ago
I have created the fix for this issue and committed the changes with 
https://code.google.com/p/ibm-db/source/detail?r=f060cda47a854351d86f90954c595fb
c93c361c0  and 
https://code.google.com/p/ibm-db/source/detail?r=af7db4b718ac947665b394e043d2d26
5db572b96&repo=ibm-db-sa.

Please give a try and let me know how it works.

Original comment by rahul.pr...@in.ibm.com on 16 Jan 2015 at 10:15