Open jamadden opened 3 years ago
Cross linking: https://nextthought.atlassian.net/browse/NTI-10471
It looks like we are in a function attached to the transaction.
Agreed.
An exception has been raised (which I think means we’ve retried and/or cleared the volatile attributes off) and now our near end manager is firing and we can’t set the cookie.
I don't think that's right. Instead, this is the exception being raised. But I can see why you might think that from the traceback. The error is happening in _commitResources()
, when it calls tpc_finish()
on each resource manager. The function is attached with a resource manager that calls it in tpc_finish
. The fact that reraise(t, v, tb)
shows up in the traceback above the call to _commitResources()
is an artifact of how tracebacks work in Python (2). Normally, tracebacks only show from the place that catches them down to the place that caused them. This exception is being caught and then re-raised, which causes the calling stack of reraise
to get put on top of the catching stack.
For example, consider foo.py
:
import sys
from six import reraise
def top():
middle()
def middle():
try:
bottom() # line 9
except:
reraise(*sys.exc_info()) # line 11
def bottom():
raise Exception("From the leaf")
top()
$ python foo.py
Traceback (most recent call last):
File "/tmp/foo.py", line 16, in <module>
top()
File "/tmp/foo.py", line 5, in top
middle()
File "/tmp/foo.py", line 11, in middle
reraise(*sys.exc_info())
File "/tmp/foo.py", line 9, in middle
bottom()
File "/tmp/foo.py", line 14, in bottom
raise Exception("From the leaf")
Exception: From the leaf
Notice how the two lines in middle
appear reversed in the traceback (line 11 appears to be before line 9). That's exactly what's going on in this case (notice how the lines in _commitResources
and commit
are out of order).
I don't know if this helps to identify why _v_session_id
isn't set, but typically in the logs just before the trace Jason provided we see something like:
2021-05-06 03:29:31,380 DEBUG [nti.analytics.sessions][140287935480704:21][/dataserver2/analytics/sessions/@@analytics_session:aarontesttest] Session created (user=aarontesttest)
2021-05-06 03:29:31,480 DEBUG [nti.transactions.loop][140287934176256:21][/dataserver2/analytics/sessions/@@end_analytics_session:aarontesttest] Committed transaction description=u'/dataserver2/analytics/sessions/@@end_analytics_session', duration=0.144886016846, retries=0, sleep_time=0
136.228.116.66 - aarontesttest [2021-05-06 03:29:31.483977] "POST /dataserver2/analytics/sessions/@@end_analytics_session HTTP/1.1" 200 356 "https://pendoonboardingtest.nextthot.com/app/library/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.85 Safari/537.36" "-@-" "s45034a8c5eda46a5996b999df7b828c6" (5/250) [140287934176256:21] 0.215655s
Traceback (most recent call last):
File "/home/ntiuser/buildout/eggs/gevent-21.1.2-py2.7-linux-x86_64.egg/gevent/threadpool.py", line 167, in __run_task
thread_result.set(func(*args, **kwargs))
File "/home/ntiuser/buildout/eggs/RelStorage-3.4.0-py2.7-linux-x86_64.egg/relstorage/adapters/sqlite/drivers.py", line 111, in execute
return sqlite3.Cursor.execute(self, stmt, params)
OperationalError: database is locked
2021-05-06T03:29:31Z (<ThreadPoolWorker at 0x7f976165aeb0 thread_ident=0x7f976958c700 threadpool-hub=<Hub at 0x7f978d555f70 thread_ident=0x7f979ddcc580>>, <unbound method Cursor.execute>) failed with OperationalError
2021-05-06 03:29:31,602 ERROR [nti.asynchronous.job][140287935480704:21][/dataserver2/analytics/sessions/@@analytics_session:aarontesttest] Job (<nti.asynchronous.job.Job at 7f977d3dac10 {'_active_start': datetime.datetime(2021, 5, 6, 3, 29, 31, 381684), 'kwargs': {'timestamp': datetime.datetime(2021, 5, 6, 3, 29, 31, 381401), 'username': u'aarontesttest', u'site_name': u's45034a8c5eda46a5996b999df7b828c6', 'session_id': 33}, '_id': u'e4aba4b2-a7f6-48ba-a2b8-ebe9e44207b6', '_status_id': 3, '_callable_name': None, '_callable_root': <function _execute_job at 0x7f975fd190d0>, '_error': <nti.asynchronous.job.Error at 7f977d3da250 {'message': u'Traceback (most recent call last):\n File "/home/ntiuser/buildout/sources/nti.asynchronous/src/nti/asynchronous/job.py", line 153, in run\n result = self.callable(*effective_args, **effective_kwargs)\n - __traceback_info__: (<function _execute_job at 0x7f975fd190d0>, None, [<function _end_session at 0x7f975f876f50>], {\'timestamp\': datetime.datetime(2021, 5, 6, 3, 29, 31, 381401), \'username\': u\'aarontesttest\', u\'site_name\': u\'s45034a8c5eda46a5996b999df7b828c6\', \'session_id\': 33})\n File "/home/ntiuser/buildout/sources/nti.analytics/src/nti/analytics/common.py", line 214, in _execute_job\n return _do_execute_job(*args, **kwargs)\n File "/home/ntiuser/buildout/sources/nti.analytics/src/nti/analytics/common.py", line 167, in _do_execute_job\n result = func( *args, **kwargs )\n File "/home/ntiuser/buildout/sources/nti.analytics/src/nti/analytics/sessions.py", line 53, in _end_session\n db_sessions.end_session( user, session_id, timestamp )\n File "/home/ntiuser/buildout/sources/nti.analytics/src/nti/analytics/database/sessions.py", line 72, in end_session\n user = get_or_create_user(user)\n File "/home/ntiuser/buildout/sources/nti.analytics/src/nti/analytics/database/users.py", line 83, in get_or_create_user\n found_user = get_user_record(user)\n File "/home/ntiuser/buildout/sources/nti.analytics/src/nti/analytics/database/users.py", line 78, in get_user_record\n found_user = db.session.query(Users).filter(Users.user_ds_id == uid).first()\n File "/home/ntiuser/buildout/eggs/SQLAlchemy-1.3.23-py2.7-linux-x86_64.egg/sqlalchemy/orm/query.py", line 3429, in first\n ret = list(self[0:1])\n File "/home/ntiuser/buildout/eggs/SQLAlchemy-1.3.23-py2.7-linux-x86_64.egg/sqlalchemy/orm/query.py", line 3203, in __getitem__\n return list(res)\n File "/home/ntiuser/buildout/eggs/SQLAlchemy-1.3.23-py2.7-linux-x86_64.egg/sqlalchemy/orm/query.py", line 3534, in __iter__\n self.session._autoflush()\n File "/home/ntiuser/buildout/eggs/SQLAlchemy-1.3.23-py2.7-linux-x86_64.egg/sqlalchemy/orm/session.py", line 1633, in _autoflush\n util.raise_(e, with_traceback=sys.exc_info()[2])\n File "/home/ntiuser/buildout/eggs/SQLAlchemy-1.3.23-py2.7-linux-x86_64.egg/sqlalchemy/orm/session.py", line 1622, in _autoflush\n self.flush()\n File "/home/ntiuser/buildout/eggs/SQLAlchemy-1.3.23-py2.7-linux-x86_64.egg/sqlalchemy/orm/session.py", line 2540, in flush\n self._flush(objects)\n File "/home/ntiuser/buildout/eggs/SQLAlchemy-1.3.23-py2.7-linux-x86_64.egg/sqlalchemy/orm/session.py", line 2682, in _flush\n transaction.rollback(_capture_exception=True)\n File "/home/ntiuser/buildout/eggs/SQLAlchemy-1.3.23-py2.7-linux-x86_64.egg/sqlalchemy/util/langhelpers.py", line 70, in __exit__\n with_traceback=exc_tb,\n File "/home/ntiuser/buildout/eggs/SQLAlchemy-1.3.23-py2.7-linux-x86_64.egg/sqlalchemy/orm/session.py", line 2642, in _flush\n flush_context.execute()\n File "/home/ntiuser/buildout/eggs/SQLAlchemy-1.3.23-py2.7-linux-x86_64.egg/sqlalchemy/orm/unitofwork.py", line 422, in execute\n rec.execute(self)\n File "/home/ntiuser/buildout/eggs/SQLAlchemy-1.3.23-py2.7-linux-x86_64.egg/sqlalchemy/orm/unitofwork.py", line 589, in execute\n uow,\n File "/home/ntiuser/buildout/eggs/SQLAlchemy-1.3.23-py2.7-linux-x86_64.egg/sqlalchemy/orm/persistence.py", line 245, in save_obj\n insert,\n File "/home/ntiuser/buildout/eggs/SQLAlchemy-1.3.23-py2.7-linux-x86_64.egg/sqlalchemy/orm/persistence.py", line 1136, in _emit_insert_statements\n statement, params\n File "/home/ntiuser/buildout/eggs/SQLAlchemy-1.3.23-py2.7-linux-x86_64.egg/sqlalchemy/engine/base.py", line 1011, in execute\n return meth(self, multiparams, params)\n File "/home/ntiuser/buildout/eggs/SQLAlchemy-1.3.23-py2.7-linux-x86_64.egg/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection\n return connection._execute_clauseelement(self, multiparams, params)\n File "/home/ntiuser/buildout/eggs/SQLAlchemy-1.3.23-py2.7-linux-x86_64.egg/sqlalchemy/engine/base.py", line 1130, in _execute_clauseelement\n distilled_params,\n File "/home/ntiuser/buildout/eggs/SQLAlchemy-1.3.23-py2.7-linux-x86_64.egg/sqlalchemy/engine/base.py", line 1317, in _execute_context\n e, statement, parameters, cursor, context\n File "/home/ntiuser/buildout/eggs/SQLAlchemy-1.3.23-py2.7-linux-x86_64.egg/sqlalchemy/engine/base.py", line 1511, in _handle_dbapi_exception\n sqlalchemy_exception, with_traceback=exc_info[2], from_=e\n File "/home/ntiuser/buildout/eggs/SQLAlchemy-1.3.23-py2.7-linux-x86_64.egg/sqlalchemy/engine/base.py", line 1277, in _execute_context\n cursor, statement, parameters, context\n File "/home/ntiuser/buildout/eggs/SQLAlchemy-1.3.23-py2.7-linux-x86_64.egg/sqlalchemy/engine/default.py", line 608, in do_execute\n cursor.execute(statement, parameters)\n File "/home/ntiuser/buildout/eggs/RelStorage-3.4.0-py2.7-linux-x86_64.egg/relstorage/adapters/sqlite/drivers.py", line 606, in in_threadpool\n func, (self, stmt, params)\n File "/home/ntiuser/buildout/eggs/gevent-21.1.2-py2.7-linux-x86_64.egg/gevent/pool.py", line 161, in apply\n return self.spawn(func, *args, **kwds).get()\n File "src/gevent/event.py", line 329, in gevent._gevent_cevent.AsyncResult.get\n File "src/gevent/event.py", line 359, in gevent._gevent_cevent.AsyncResult.get\n File "src/gevent/event.py", line 347, in gevent._gevent_cevent.AsyncResult.get\n File "src/gevent/event.py", line 327, in gevent._gevent_cevent.AsyncResult._raise_exception\n File "/home/ntiuser/buildout/eggs/gevent-21.1.2-py2.7-linux-x86_64.egg/gevent/threadpool.py", line 167, in __run_task\n thread_result.set(func(*args, **kwargs))\n File "/home/ntiuser/buildout/eggs/RelStorage-3.4.0-py2.7-linux-x86_64.egg/relstorage/adapters/sqlite/drivers.py", line 111, in execute\n return sqlite3.Cursor.execute(self, stmt, params)\nOperationalError: (raised as a result of Query-invoked autoflush; consider using a session.no_autoflush block if this flush is occurring prematurely)\n(sqlite3.OperationalError) database is locked\n[SQL: INSERT INTO "Sessions" (user_id, ip_addr, user_agent_id, start_time, end_time) VALUES (?, ?, ?, ?, ?)]\n[parameters: (2, \'136.228.116.66\', 1, \'2021-05-06 03:29:31.000000\', None)]\n(Background on this error at: http://sqlalche.me/e/13/e3q8)\n'}>, 'args': (<function _end_session at 0x7f975f876f50>,)}>) execution failed
Traceback (most recent call last):
File "/home/ntiuser/buildout/sources/nti.asynchronous/src/nti/asynchronous/job.py", line 153, in run
result = self.callable(*effective_args, **effective_kwargs)
- __traceback_info__: (<function _execute_job at 0x7f975fd190d0>, None, [<function _end_session at 0x7f975f876f50>], {'timestamp': datetime.datetime(2021, 5, 6, 3, 29, 31, 381401), 'username': u'aarontesttest', u'site_name': u's45034a8c5eda46a5996b999df7b828c6', 'session_id': 33})
File "/home/ntiuser/buildout/sources/nti.analytics/src/nti/analytics/common.py", line 214, in _execute_job
return _do_execute_job(*args, **kwargs)
File "/home/ntiuser/buildout/sources/nti.analytics/src/nti/analytics/common.py", line 167, in _do_execute_job
result = func( *args, **kwargs )
File "/home/ntiuser/buildout/sources/nti.analytics/src/nti/analytics/sessions.py", line 53, in _end_session
db_sessions.end_session( user, session_id, timestamp )
File "/home/ntiuser/buildout/sources/nti.analytics/src/nti/analytics/database/sessions.py", line 72, in end_session
user = get_or_create_user(user)
File "/home/ntiuser/buildout/sources/nti.analytics/src/nti/analytics/database/users.py", line 83, in get_or_create_user
found_user = get_user_record(user)
File "/home/ntiuser/buildout/sources/nti.analytics/src/nti/analytics/database/users.py", line 78, in get_user_record
found_user = db.session.query(Users).filter(Users.user_ds_id == uid).first()
File "/home/ntiuser/buildout/eggs/SQLAlchemy-1.3.23-py2.7-linux-x86_64.egg/sqlalchemy/orm/query.py", line 3429, in first
ret = list(self[0:1])
File "/home/ntiuser/buildout/eggs/SQLAlchemy-1.3.23-py2.7-linux-x86_64.egg/sqlalchemy/orm/query.py", line 3203, in __getitem__
return list(res)
File "/home/ntiuser/buildout/eggs/SQLAlchemy-1.3.23-py2.7-linux-x86_64.egg/sqlalchemy/orm/query.py", line 3534, in __iter__
self.session._autoflush()
File "/home/ntiuser/buildout/eggs/SQLAlchemy-1.3.23-py2.7-linux-x86_64.egg/sqlalchemy/orm/session.py", line 1633, in _autoflush
util.raise_(e, with_traceback=sys.exc_info()[2])
File "/home/ntiuser/buildout/eggs/SQLAlchemy-1.3.23-py2.7-linux-x86_64.egg/sqlalchemy/orm/session.py", line 1622, in _autoflush
self.flush()
File "/home/ntiuser/buildout/eggs/SQLAlchemy-1.3.23-py2.7-linux-x86_64.egg/sqlalchemy/orm/session.py", line 2540, in flush
self._flush(objects)
File "/home/ntiuser/buildout/eggs/SQLAlchemy-1.3.23-py2.7-linux-x86_64.egg/sqlalchemy/orm/session.py", line 2682, in _flush
transaction.rollback(_capture_exception=True)
File "/home/ntiuser/buildout/eggs/SQLAlchemy-1.3.23-py2.7-linux-x86_64.egg/sqlalchemy/util/langhelpers.py", line 70, in __exit__
with_traceback=exc_tb,
File "/home/ntiuser/buildout/eggs/SQLAlchemy-1.3.23-py2.7-linux-x86_64.egg/sqlalchemy/orm/session.py", line 2642, in _flush
flush_context.execute()
File "/home/ntiuser/buildout/eggs/SQLAlchemy-1.3.23-py2.7-linux-x86_64.egg/sqlalchemy/orm/unitofwork.py", line 422, in execute
rec.execute(self)
File "/home/ntiuser/buildout/eggs/SQLAlchemy-1.3.23-py2.7-linux-x86_64.egg/sqlalchemy/orm/unitofwork.py", line 589, in execute
uow,
File "/home/ntiuser/buildout/eggs/SQLAlchemy-1.3.23-py2.7-linux-x86_64.egg/sqlalchemy/orm/persistence.py", line 245, in save_obj
insert,
File "/home/ntiuser/buildout/eggs/SQLAlchemy-1.3.23-py2.7-linux-x86_64.egg/sqlalchemy/orm/persistence.py", line 1136, in _emit_insert_statements
statement, params
File "/home/ntiuser/buildout/eggs/SQLAlchemy-1.3.23-py2.7-linux-x86_64.egg/sqlalchemy/engine/base.py", line 1011, in execute
return meth(self, multiparams, params)
File "/home/ntiuser/buildout/eggs/SQLAlchemy-1.3.23-py2.7-linux-x86_64.egg/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "/home/ntiuser/buildout/eggs/SQLAlchemy-1.3.23-py2.7-linux-x86_64.egg/sqlalchemy/engine/base.py", line 1130, in _execute_clauseelement
distilled_params,
File "/home/ntiuser/buildout/eggs/SQLAlchemy-1.3.23-py2.7-linux-x86_64.egg/sqlalchemy/engine/base.py", line 1317, in _execute_context
e, statement, parameters, cursor, context
File "/home/ntiuser/buildout/eggs/SQLAlchemy-1.3.23-py2.7-linux-x86_64.egg/sqlalchemy/engine/base.py", line 1511, in _handle_dbapi_exception
sqlalchemy_exception, with_traceback=exc_info[2], from_=e
File "/home/ntiuser/buildout/eggs/SQLAlchemy-1.3.23-py2.7-linux-x86_64.egg/sqlalchemy/engine/base.py", line 1277, in _execute_context
cursor, statement, parameters, context
File "/home/ntiuser/buildout/eggs/SQLAlchemy-1.3.23-py2.7-linux-x86_64.egg/sqlalchemy/engine/default.py", line 608, in do_execute
cursor.execute(statement, parameters)
File "/home/ntiuser/buildout/eggs/RelStorage-3.4.0-py2.7-linux-x86_64.egg/relstorage/adapters/sqlite/drivers.py", line 606, in in_threadpool
func, (self, stmt, params)
File "/home/ntiuser/buildout/eggs/gevent-21.1.2-py2.7-linux-x86_64.egg/gevent/pool.py", line 161, in apply
return self.spawn(func, *args, **kwds).get()
File "src/gevent/event.py", line 329, in gevent._gevent_cevent.AsyncResult.get
File "src/gevent/event.py", line 359, in gevent._gevent_cevent.AsyncResult.get
File "src/gevent/event.py", line 347, in gevent._gevent_cevent.AsyncResult.get
File "src/gevent/event.py", line 327, in gevent._gevent_cevent.AsyncResult._raise_exception
File "/home/ntiuser/buildout/eggs/gevent-21.1.2-py2.7-linux-x86_64.egg/gevent/threadpool.py", line 167, in __run_task
thread_result.set(func(*args, **kwargs))
File "/home/ntiuser/buildout/eggs/RelStorage-3.4.0-py2.7-linux-x86_64.egg/relstorage/adapters/sqlite/drivers.py", line 111, in execute
return sqlite3.Cursor.execute(self, stmt, params)
OperationalError: (raised as a result of Query-invoked autoflush; consider using a session.no_autoflush block if this flush is occurring prematurely)
(sqlite3.OperationalError) database is locked
[SQL: INSERT INTO "Sessions" (user_id, ip_addr, user_agent_id, start_time, end_time) VALUES (?, ?, ?, ?, ?)]
[parameters: (2, '136.228.116.66', 1, '2021-05-06 03:29:31.000000', None)]
(Background on this error at: http://sqlalche.me/e/13/e3q8)
2021-05-06 03:29:31,761 CRITI [txn.GLOBAL][140287935480704:21][/dataserver2/analytics/sessions/@@analytics_session:aarontesttest] A storage error occurred during the second phase of the two-phase commit. Resources may be in an inconsistent state.
2021-05-06 03:29:31,770 DEBUG [nti.transactions.loop][140287935480704:21][/dataserver2/analytics/sessions/@@analytics_session:aarontesttest] Transaction aborted; retrying False/2; '<type 'exceptions.AttributeError'>'/<type 'exceptions.AttributeError'>
2021-05-06 03:29:31,965 DEBUG [nti.transactions.loop][140287934173808:21][<_WebSocketPinger for 0x2cb7399980303675/aarontest>] Committed transaction description=u'_do_ping', duration=0.000218152999878, retries=0, sleep_time=0
The intent here is that we capture the newly created session_id
off of the sqlalchemy Sessions object. We will not have a session_id value until we commit to the analytics db. Before we used to flush and do other transactionally unsafe actions to capture that attribute before we committed the transaction. Now we use a sqlalchmy event listener (on after_insert
statements) to capture this value on a volatile attribute on the object (_v_session_id
).
This appears to only happen in container envs (sqlite db issue only?).
Maybe there is an order-of-operation issue here - the near-end data manager runs before the sqlalchemy tx commits (and fires event).
Maybe we're in a retried tx and this data manager is still around from the previous transaction. I'm just thinking out loud of various possibilities here. Not sure if this case is possible.
Maybe there is an order-of-operation issue here - the near-end data manager runs before the sqlalchemy tx commits (and fires event).
That could be possible. The ordering is hard to guarantee. If that's the case, then the afterCommitHook
would fix the issue (assuming the new_sessions
object can still be used then. Maybe the event listener should pass the session ID directly to the set_cookie function? That would also just take the form of using afterCommitHook
.)
Maybe we're in a retried tx and this data manager is still around from the previous transaction. I'm just thinking out loud of various possibilities here. Not sure if this case is possible.
No, it really shouldn't be. Data managers are tied to a Transaction
object, and a retry starts with a fresh Transaction
object. Making sure that's the case is part of why we use explicit transaction managers.
Think I see what this is now, thanks to the error Chris pointed out above.
In these container sites, we run analytics events through the ImmediateQueueRunner, which creates and runs (nti.asynchronous) jobs (commits, updates, etc) immediately instead of going through redis for a separate process to run. We do this to conserve memory in these containers. This flow is useful for tests, but not what we probably want to do for live environments. This setup explains why we only see this issue in container envs.
The job that runs captures and logs the error (ugh, how does that work transactionally with jobs and the async runner, do we just commit anyway?). Since the sqlalchemy stuff fails, our sqlalchemy hook never fires and this attr is not around when we look for it.
For this case, I think we just want to inline the analytics db work instead of creating jobs that swallow the errors. This will ensure any errors bubble up and rollback the tx.
We sometimes get unhandled error reports like this:
As you can see, this code is being called from
tpc_finish
, the last step of the two-phase commit protocol. It's explicitly documented that this must never happen:(Whether persistent corruption results depends on a bunch of details.)
I'm unfamiliar with this code, so I don't know what
Sessions
is or why it doesn't have its volatile attribute at this point.https://github.com/NextThought/nti.app.analytics/blob/82e06e04118d6d0cc10b7f2ec6a0afde57fa44d2/src/nti/app/analytics/views.py#L361-L366
I can suggest a really quick workaround: The comment suggests that instead of using
transactions.do_near_end
, which registers a resource manager that is subject to the constraints listed above, we could probably just usetransaction.addAfterCommitHook()
. This has the advantage of ignoring exceptions raised by the hook (and not corrupting the rest of the transaction). It has the disadvantage of, well, ignoring exceptions raised by the hook (they do go in the log, at least).