Open dhirschfeld opened 1 month ago
If I change the run_query
function to instead accept a sa.Connection
then the query works in a background thread:
>>> def run_query(conn: sa.Connection):
... return conn.execute(sa.text("select * from Users")).fetchall()
>>> with engine.connect() as conn:
... res = run_query(conn)
>>> res
[(1, 'spongebob'), (2, 'sandy'), (3, 'patrick')]
>>> with engine.connect() as conn:
... res = await anyio.to_thread.run_sync(run_query, conn)
>>> res
[(1, 'spongebob'), (2, 'sandy'), (3, 'patrick')]
It would be great if it were possible to pass an engine to s separate thread to use so you could use the same code irrespective of whether you were connected to a Postgres database in production or a duckdb
in-memory database in CI.
Calling back into the main-thread from the worker thread seems to work, but then it only works from the worker-thread context, so, not ideal.
def run_query(engine: sa.Engine):
with anyio.from_thread.run_sync(engine.connect) as conn:
return conn.execute(sa.text("select * from Users")).fetchall()
My observation is that passing an engine connected to an in-memory
duckdb
database to a different thread doesn't work.I'm wondering if that's expected or if it would be considered a bug / missing feature?
Example:
Running the
run_query
function works as expected:...but if I run it in a background thread I get a
Catalog Error: Table with name Users does not exist!
exception 😔My assumption is that the engine loses it's connection to the in-memory database in the main thread and creates a new in-memory database where that table doesn't exist?
>>> await anyio.to_thread.run_sync(run_query, engine)
```python-traceback --------------------------------------------------------------------------- CatalogException Traceback (most recent call last) File /opt/python/envs/dev310/.pixi/envs/default/lib/python3.10/site-packages/sqlalchemy/engine/base.py:1967, in Connection._exec_single_context(self, dialect, context, statement, parameters) 1966 if not evt_handled: -> 1967 self.dialect.do_execute( 1968 cursor, str_statement, effective_parameters, context 1969 ) 1971 if self._has_events or self.engine._has_events: File /opt/python/envs/dev310/.pixi/envs/default/lib/python3.10/site-packages/sqlalchemy/engine/default.py:941, in DefaultDialect.do_execute(self, cursor, statement, parameters, context) 940 def do_execute(self, cursor, statement, parameters, context=None): --> 941 cursor.execute(statement, parameters) File /opt/python/envs/dev310/.pixi/envs/default/lib/python3.10/site-packages/duckdb_engine/__init__.py:140, in CursorWrapper.execute(self, statement, parameters, context) 139 else: --> 140 self.__c.execute(statement, parameters) 141 except RuntimeError as e: CatalogException: Catalog Error: Table with name Users does not exist! Did you mean "sqlite_master"? LINE 1: select * from Users ^ The above exception was the direct cause of the following exception: ProgrammingError Traceback (most recent call last) Cell In[10], line 1 ----> 1 await anyio.to_thread.run_sync(run_query, engine) File /opt/python/envs/dev310/.pixi/envs/default/lib/python3.10/site-packages/anyio/to_thread.py:56, in run_sync(func, abandon_on_cancel, cancellable, limiter, *args) 48 abandon_on_cancel = cancellable 49 warn( 50 "The `cancellable=` keyword argument to `anyio.to_thread.run_sync` is " 51 "deprecated since AnyIO 4.1.0; use `abandon_on_cancel=` instead", 52 DeprecationWarning, 53 stacklevel=2, 54 ) ---> 56 return await get_async_backend().run_sync_in_worker_thread( 57 func, args, abandon_on_cancel=abandon_on_cancel, limiter=limiter 58 ) File /opt/python/envs/dev310/.pixi/envs/default/lib/python3.10/site-packages/anyio/_backends/_trio.py:1060, in TrioBackend.run_sync_in_worker_thread(cls, func, args, abandon_on_cancel, limiter) 1057 return func(*args) 1059 token = TrioBackend.current_token() -> 1060 return await run_sync( 1061 wrapper, 1062 abandon_on_cancel=abandon_on_cancel, 1063 limiter=cast(trio.CapacityLimiter, limiter), 1064 ) File /opt/python/envs/dev310/.pixi/envs/default/lib/python3.10/site-packages/trio/_threads.py:437, in to_thread_run_sync(sync_fn, thread_name, abandon_on_cancel, limiter, *args) 433 msg_from_thread: outcome.Outcome[RetT] | Run[object] | RunSync[object] = ( 434 await trio.lowlevel.wait_task_rescheduled(abort) 435 ) 436 if isinstance(msg_from_thread, outcome.Outcome): --> 437 return msg_from_thread.unwrap() 438 elif isinstance(msg_from_thread, Run): 439 await msg_from_thread.run() File /opt/python/envs/dev310/.pixi/envs/default/lib/python3.10/site-packages/outcome/_impl.py:213, in Error.unwrap(***failed resolving arguments***) 211 captured_error = self.error 212 try: --> 213 raise captured_error 214 finally: 215 # We want to avoid creating a reference cycle here. Python does 216 # collect cycles just fine, so it wouldn't be the end of the world (...) 225 # methods frame, we avoid the 'captured_error' object's 226 # __traceback__ from indirectly referencing 'captured_error'. 227 del captured_error, self File /opt/python/envs/dev310/.pixi/envs/default/lib/python3.10/site-packages/trio/_threads.py:363, in to_thread_run_sync.