Open ansgarstrother opened 6 years ago
Hi @ansgarstrother sorry you're running into this issue. I think that example code you found may be out of date. Here are the docs for running CockroachDB with SQLAlchemy: https://www.cockroachlabs.com/docs/stable/build-a-python-app-with-cockroachdb-sqlalchemy.html
At the bottom of that page, there is a link to a repo with some examples of how to use CockroachDB with SQLAlchemy: https://github.com/cockroachdb/examples-orms/tree/master/python/sqlalchemy
Those examples still use db.session.commit()
.
Just so I better understand the issue, was there a reason you decided to make the change? Was the code not performing as you expected with db.session.commit()
?
We were having issues where we need to retry. We posted on the forum here and were told to make the change outlined above.
Hi @ansgarstrother, sorry, I gave you incorrect information above. The code you found in that blog post is correct, and run_transaction
is the correct way to handle transaction retries in SQLAlchemy.
In this case, there must be something else going on. What sort of tests are you running? This sort of issue might happen if you were running a load test with an unbounded thread pool causing high contention in the system.
It sounds like there's probably a leak in run_transaction
when used with SQLAlchemy sessions (we just had another report of the same thing in #55). Unfortunately I never use the session API myself so I'm not sure about the correct idioms here. run_transaction
creates a session without closing it. Should it? It doesn't seem to be leaking connections in this small test (whether I put the Session = sessionmaker(testing.db)
line inside the loop):
def test_connection_leak(self):
def txn_body(session):
list(session.query(Account))
for i in range(2 * testing.db.pool.size()):
Session = sessionmaker(testing.db)
run_transaction(Session, txn_body)
@ansgarstrother or @keiranmraine, can you post an example of how you're creating and managing your session objects and transactions?
So I've done some investigation and the problem isn't that they need to be closed (as that causes problems with the object being attached to a closed session) but that the session needs to rollback or commit when the session is returned.
I've not tested within this library but the following may work for a session:
elif isinstance(transactor, sqlalchemy.orm.sessionmaker):
session = transactor(autocommit=True)
try:
result = _txn_retry_loop(session, callback)
session.commit()
return result
except ...:
session.rollback()
raise ...
Our non-cockroach approach is to create a context manager:
@contextmanager
@sqlalchemy_exception_handler
def wrap_run_transaction(db):
"""Provide a transactional scope around a series of operations."""
session = db.create_scoped_session()
try:
yield session
session.commit()
except Exception as err:
session.rollback()
raise err
# do not close, just make sure transactions are complete
and use as:
with wrap_run_transaction(db) as session:
_set_file_state(session, file_uuid, 'Uploaded', size_bytes=file_size)
Inspiration taken from here: http://docs.sqlalchemy.org/en/latest/orm/session_basics.html#when-do-i-construct-a-session-when-do-i-commit-it-and-when-do-i-close-it
Not closing is intentional as otherwise lazy loading will fail on objects if used generically for read and write.
_txn_retry_loop
uses with conn.begin():
(where conn
is actually the session). This is supposed to be equivalent to the try/except block that calls commit or rollback.
Not closing is intentional as otherwise lazy loading will fail on objects if used generically for read and write.
Interesting. So if there's a lazy-loaded relationship, the objects created in the transaction are holding onto the Session for later lazy-loading? That seems like it might be an issue (but again, I never use the ORM half of SQLAlchemy so I might be misinterpreting).
Sessions created by run_transaction
have the autocommit
flag set (because with this flag set, Sessions have an interface that is compatible with Connections). However, if such a session is accessed outside the transaction, it looks like it just creates a new connection without doing anything to arrange for it to be cleaned up. This seems like a bug in sqlalchemy, although this autocommit mode is deprecated.
If this is the issue, then we can fix the leak by switching back to standard (non-autocommit) sessions and wrap them in an adapter object to give them the same interface as connections. However, any use of lazy loading runs the risk of getting a retryable error. I would recommend disabling lazy loading for any application using CockroachDB and always do your loading inside a transaction with run_transaction
.
I forgot to mention that the approach we are using is via flasks create_scoped_session
.
The DB state (in PostGres) ended up looking like this, note the inactive transaction connections:
# clean DB stata
genserv=> select usename, client_addr, state, state_change
from pg_stat_activity
where datname='genserv';
usename | client_addr | state | state_change
-----------------+-------------+--------+-------------------------------
genserv_service | | active | 2018-05-11 05:13:16.055881+00
(1 row)
# after a login through UI
usename | client_addr | state | state_change
-----------------+-------------+---------------------+-------------------------------
genserv_service | | active | 2018-05-11 05:15:50.224041+00
genserv_service | 10.11.0.1 | idle in transaction | 2018-05-11 05:13:43.864191+00
(2 rows)
# after a few logins
usename | client_addr | state | state_change
-----------------+-------------+---------------------+-------------------------------
genserv_service | | active | 2018-05-11 05:18:59.936199+00
genserv_service | 10.11.0.1 | idle in transaction | 2018-05-11 05:16:00.938154+00
genserv_service | 10.11.0.1 | idle | 2018-05-11 05:16:01.23964+00
genserv_service | 10.11.0.1 | idle | 2018-05-11 05:16:01.239971+00
genserv_service | 10.11.0.1 | idle in transaction | 2018-05-11 05:16:01.247961+00
genserv_service | 10.11.0.1 | idle in transaction | 2018-05-11 05:16:01.261989+00
(6 rows)
We just switched over from using db.sessions.commit() to using run_transaction as per the diff: https://github.com/cockroachdb/examples-python/commit/4fecdf17d1def80b349bd84e58ec042f997fcdcf
After making this change in our test environment after a little bit the api continues to get slower and eventually become completely bound up.
The following error eventually prints out: sqlalchemy.exc.TimeoutError: QueuePool limit of size 5 overflow 20 reached
If there some type of clean up that needs to occur? Or is this a sign of something else?