cockroachdb / sqlalchemy-cockroachdb

SQLAlchemy adapter for CockroachDB
Apache License 2.0
144 stars 57 forks source link

run_transaction session leak #52

Open ansgarstrother opened 6 years ago

ansgarstrother commented 6 years ago

We just switched over from using db.sessions.commit() to using run_transaction as per the diff: https://github.com/cockroachdb/examples-python/commit/4fecdf17d1def80b349bd84e58ec042f997fcdcf

After making this change in our test environment after a little bit the api continues to get slower and eventually become completely bound up.

The following error eventually prints out: sqlalchemy.exc.TimeoutError: QueuePool limit of size 5 overflow 20 reached

If there some type of clean up that needs to occur? Or is this a sign of something else?

rytaft commented 6 years ago

Hi @ansgarstrother sorry you're running into this issue. I think that example code you found may be out of date. Here are the docs for running CockroachDB with SQLAlchemy: https://www.cockroachlabs.com/docs/stable/build-a-python-app-with-cockroachdb-sqlalchemy.html

At the bottom of that page, there is a link to a repo with some examples of how to use CockroachDB with SQLAlchemy: https://github.com/cockroachdb/examples-orms/tree/master/python/sqlalchemy

Those examples still use db.session.commit().

Just so I better understand the issue, was there a reason you decided to make the change? Was the code not performing as you expected with db.session.commit()?

ansgarstrother commented 6 years ago

We were having issues where we need to retry. We posted on the forum here and were told to make the change outlined above.

https://forum.cockroachlabs.com/t/issue-acceptance-failure-in-sqlalchemy-test-due-to-retry-error/1601

rytaft commented 6 years ago

Hi @ansgarstrother, sorry, I gave you incorrect information above. The code you found in that blog post is correct, and run_transaction is the correct way to handle transaction retries in SQLAlchemy.

In this case, there must be something else going on. What sort of tests are you running? This sort of issue might happen if you were running a load test with an unbounded thread pool causing high contention in the system.

bdarnell commented 6 years ago

It sounds like there's probably a leak in run_transaction when used with SQLAlchemy sessions (we just had another report of the same thing in #55). Unfortunately I never use the session API myself so I'm not sure about the correct idioms here. run_transaction creates a session without closing it. Should it? It doesn't seem to be leaking connections in this small test (whether I put the Session = sessionmaker(testing.db) line inside the loop):

def test_connection_leak(self):
    def txn_body(session):
        list(session.query(Account))

    for i in range(2 * testing.db.pool.size()):
        Session = sessionmaker(testing.db)
        run_transaction(Session, txn_body)

@ansgarstrother or @keiranmraine, can you post an example of how you're creating and managing your session objects and transactions?

keiranmraine commented 6 years ago

So I've done some investigation and the problem isn't that they need to be closed (as that causes problems with the object being attached to a closed session) but that the session needs to rollback or commit when the session is returned.

I've not tested within this library but the following may work for a session:

https://github.com/cockroachdb/cockroachdb-python/blob/45537bfae46cdaad8d10c77cdbc86563c0fb1f0a/cockroachdb/sqlalchemy/transaction.py#L29-L31

    elif isinstance(transactor, sqlalchemy.orm.sessionmaker):
        session = transactor(autocommit=True)
        try:
          result = _txn_retry_loop(session, callback)
          session.commit()
          return result
        except ...:
          session.rollback()
          raise ...

Our non-cockroach approach is to create a context manager:

@contextmanager
@sqlalchemy_exception_handler
def wrap_run_transaction(db):
    """Provide a transactional scope around a series of operations."""
    session = db.create_scoped_session()
    try:
        yield session
        session.commit()
    except Exception as err:
        session.rollback()
        raise err
    # do not close, just make sure transactions are complete

and use as:

with wrap_run_transaction(db) as session:
            _set_file_state(session, file_uuid, 'Uploaded', size_bytes=file_size)

Inspiration taken from here: http://docs.sqlalchemy.org/en/latest/orm/session_basics.html#when-do-i-construct-a-session-when-do-i-commit-it-and-when-do-i-close-it

Not closing is intentional as otherwise lazy loading will fail on objects if used generically for read and write.

bdarnell commented 6 years ago

_txn_retry_loop uses with conn.begin(): (where conn is actually the session). This is supposed to be equivalent to the try/except block that calls commit or rollback.

Not closing is intentional as otherwise lazy loading will fail on objects if used generically for read and write.

Interesting. So if there's a lazy-loaded relationship, the objects created in the transaction are holding onto the Session for later lazy-loading? That seems like it might be an issue (but again, I never use the ORM half of SQLAlchemy so I might be misinterpreting).

Sessions created by run_transaction have the autocommit flag set (because with this flag set, Sessions have an interface that is compatible with Connections). However, if such a session is accessed outside the transaction, it looks like it just creates a new connection without doing anything to arrange for it to be cleaned up. This seems like a bug in sqlalchemy, although this autocommit mode is deprecated.

If this is the issue, then we can fix the leak by switching back to standard (non-autocommit) sessions and wrap them in an adapter object to give them the same interface as connections. However, any use of lazy loading runs the risk of getting a retryable error. I would recommend disabling lazy loading for any application using CockroachDB and always do your loading inside a transaction with run_transaction.

keiranmraine commented 6 years ago

I forgot to mention that the approach we are using is via flasks create_scoped_session.

The DB state (in PostGres) ended up looking like this, note the inactive transaction connections:

# clean DB stata
genserv=> select usename, client_addr, state, state_change
from pg_stat_activity
where datname='genserv';
     usename     | client_addr | state  |         state_change          
-----------------+-------------+--------+-------------------------------
 genserv_service |             | active | 2018-05-11 05:13:16.055881+00
(1 row)

# after a login through UI
     usename     | client_addr |        state        |         state_change     

-----------------+-------------+---------------------+-------------------------------
 genserv_service |             | active              | 2018-05-11 05:15:50.224041+00
 genserv_service | 10.11.0.1   | idle in transaction | 2018-05-11 05:13:43.864191+00
(2 rows)

# after a few logins
     usename     | client_addr |        state        |         state_change     

-----------------+-------------+---------------------+-------------------------------
 genserv_service |             | active              | 2018-05-11 05:18:59.936199+00
 genserv_service | 10.11.0.1   | idle in transaction | 2018-05-11 05:16:00.938154+00
 genserv_service | 10.11.0.1   | idle                | 2018-05-11 05:16:01.23964+00
 genserv_service | 10.11.0.1   | idle                | 2018-05-11 05:16:01.239971+00
 genserv_service | 10.11.0.1   | idle in transaction | 2018-05-11 05:16:01.247961+00
 genserv_service | 10.11.0.1   | idle in transaction | 2018-05-11 05:16:01.261989+00
(6 rows)