encode / databases

Async database support for Python. 🗄
https://www.encode.io/databases/
BSD 3-Clause "New" or "Revised" License
3.85k stars 262 forks source link

fix: incorrect concurrent usage of connection and transaction #546

Closed zevisert closed 1 year ago

zevisert commented 1 year ago

[!NOTE] EDIT: This PR contains some back and forth on how databases should handle concurrent connections across tasks. If you're looking into databases concurrency model, it's worth taking a read through all of our comments. The specific issue with the @Database.transaction decorator was fixed, but it required digging into and changing how connection state is tracked across asyncio.Tasks using asyncio.ContextVars


This commit should fix an issue with shared state with transactions.

The main issue is very well described by @circlingthesun here: https://github.com/encode/databases/issues/123#issuecomment-1028486810 - we also ran into this most reliably using locust to load test as well, but did find this issue a few times in normal usage.

The big problem is that using the @db.transaction decorator creates a new Databases.Transaction instance that, when called creates a context manager for a transaction. The problem is that in concurrent frameworks, such as ASGI servers like starlette / fastapi, it's possible to have two concurrent tasks running. Since the Transaction instance is shared between tasks, it's self variable is comparable to shared global state, concurrent code can't trust that no race condition has occurred without some helper mechanism like a ContextVar.

Here's a mermaid diagram trying to show this:

sequenceDiagram
    actor User1
    actor User2
    participant API
    participant TXN
    API->>TXN: (startup) @db.transaction creates new shared Transaction(...) for route
    User1->>API: (1): Request to API route creates new asyncio.Task(...)
    API->>TXN: (1): Transaction.__call__ assignes self._connection
    User2->>API: (2): Request to same API route creates new asyncio.Task(...)
    API->>TXN: (2): PROBLEM -- Transaction.__call__ obtains new Databases.Connection for this asyncio.Task,<br/> like above, but overwrites self._connection
    TXN->>API: (1): PROBLEM -- Transaction.__aexit__ uses self._connection of task (2)<br/>since it was not retrieved from a ContextVar
    API->>User1: (1): Responding here closes the transaction's _connection
    TXN->>API: (2): FAULT -- Transaction.__aexit__ expects to also close self._connection, <br/>but since it also tries to retrieve the connection from self._connection instead of the ContextVar,<br/>it retireves (1)'s already closed connection
    API->>User2: (2): FAULT -- Propigated exception results in 500 response code

All I've done with this PR is move the shared global state of self._connection and self._transaction into local or contextvars instead of instance variables.

Related issues are: #123 #134 Possibly related: #340 Related meta issue: #456

zanieb commented 1 year ago

Welcome and thanks for contributing! For some context, I'm a member of encode but not a maintainer of this project. Reading this out of curiosity and just have a couple notes about proper usage of the context var. Someone else with more context for the project will have to do a review of the changes too.

zevisert commented 1 year ago

Thanks for the review. I'm eager to move this forward. We use databases in one of our products, and when load testing this certainly fixes an issue with concurrency. Even your two comments were pretty insightful, so if you have another encode member who is interested in taking a look that would be great.

I'll also try the module level context var solution you helped me discover in our codebase to see if that can still fix the issue. As is stands Databases was (and continues) leaking memory from it's usage of non-module level contextvars, but if I can fix that here too then that's great!

zevisert commented 1 year ago

Also related #155

zanieb commented 1 year ago

Do you have a minimal executable example I could use to explore this?

zevisert commented 1 year ago

I think #155 should be close enough, but I'll try and get a dedicated MCVE up here today

zevisert commented 1 year ago

@madkinsz Here's a minimal example as a test - turns out I could get it pretty small. This fails on master, but passes perfectly fine here:

# file: tests/test_concurrent_tasks.py
import pytest
import asyncio
from tests.test_databases import async_adapter, DATABASE_URLS
from databases import Database

@pytest.mark.parametrize("database_url", DATABASE_URLS)
@async_adapter
async def test_concurrent_tasks(database_url: str):
    """
    Test concurrent tasks.
    """

    async with Database(database_url) as database:
        # This could be an eg, fastapi / starlette endpoint
        @database.transaction()
        async def read_items() -> dict:
            return dict(await database.fetch_one(query="SELECT 1 AS value"))

        # Run 10 concurrent **tasks**
        tasks = (asyncio.create_task(read_items()) for _ in range(10))
        responses = await asyncio.gather(*tasks, return_exceptions=True)

        # Check the responses
        assert all(isinstance(response, dict) for response in responses)
        assert all(response["value"] == 1 for response in responses)
I'm using docker compose and pytest to run this ```yaml # file: test.docker-compose.yaml version: "3" services: postgres: image: postgres:14 environment: POSTGRES_USER: username POSTGRES_PASSWORD: password POSTGRES_DB: testsuite ports: - 5432:5432 mysql: image: mysql:5.7 environment: MYSQL_USER: username MYSQL_PASSWORD: password MYSQL_ROOT_PASSWORD: password MYSQL_DATABASE: testsuite ports: - 3306:3306 ``` ```console $ docker compose --file test.docker-compose.yaml up --detach [+] Running 2/2 ⠿ Container databases-mysql-1 Started ⠿ Container databases-postgres-1 Started $ git log -1 --oneline b6eba5f (HEAD -> master, upstream/master, upstream/HEAD, github/master) Bump wheel from 0.37.1 to 0.38.1 (#524) $ DATABASE_URLS=( > 'sqlite:///testsuite' > 'sqlite+aiosqlite:///testsuite' > 'mysql://username:password@localhost:3306/testsuite' > 'mysql+aiomysql://username:password@localhost:3306/testsuite' > 'mysql+asyncmy://username:password@localhost:3306/testsuite' > 'postgresql://username:password@localhost:5432/testsuite' > 'postgresql+aiopg://username:password@127.0.0.1:5432/testsuite' > 'postgresql+asyncpg://username:password@localhost:5432/testsuite' > ) $ export TEST_DATABASE_URLS=$(IFS=, ; echo "${DATABASE_URLS[*]}") $ pytest -k test_concurrent_tasks --verbose --exitfirst # --exitfirst for brevity, they all fail or hang ==================================== test session starts ===================================== platform linux -- Python 3.10.8, pytest-7.1.2, pluggy-1.0.0 -- /opt/tooling/pyenv/versions/encode-databases/bin/python3.10 cachedir: .pytest_cache rootdir: /code/oss/github/encode/databases plugins: cov-3.0.0, anyio-3.6.2 collected 360 items / 352 deselected / 8 selected tests/test_concurrent_fastapi.py::test_concurrent_tasks[sqlite:///testsuite] FAILED [ 12%] ========================================== FAILURES ========================================== _________________________ test_concurrent_tasks[sqlite:///testsuite] _________________________ database_url = 'sqlite:///testsuite' @pytest.mark.parametrize("database_url", DATABASE_URLS) @async_adapter async def test_concurrent_tasks(database_url: str): """ Test concurrent tasks. """ async with Database(database_url) as database: # This could be an eg, fastapi / starlette endpoint @database.transaction() async def read_items() -> dict: return dict(await database.fetch_one(query="SELECT 1 AS value")) # Run 10 concurrent **tasks** tasks = (asyncio.create_task(read_items()) for _ in range(10)) responses = await asyncio.gather(*tasks, return_exceptions=True) # Check the responses E > assert all(isinstance(response, dict) for response in responses) E assert False E + where False = all(. at 0x7f1b26e86420>) tests/test_concurrent_fastapi.py:25: AssertionError ================================== short test summary info =================================== FAILED tests/test_concurrent_fastapi.py::test_concurrent_tasks[sqlite:///testsuite] - asser... !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ======================== 1 failed, 352 deselected, 1 warning in 0.28s ======================== ^CException ignored in: Traceback (most recent call last): File "/opt/tooling/pyenv/versions/3.10.8/lib/python3.10/threading.py", line 1567, in _shutdown lock.acquire() KeyboardInterrupt: $ git switch - Switched to branch 'fix-transaction-contextvar' Your branch is up to date with 'github/fix-transaction-contextvar'. $ git log -1 --oneline 3c0790d (HEAD -> fix-transaction-contextvar, github/fix-transaction-contextvar) fix: contextvar.get takes no keyword arguments $ pytest -k test_concurrent_tasks --verbose --exitfirst ==================================== test session starts ===================================== platform linux -- Python 3.10.8, pytest-7.1.2, pluggy-1.0.0 -- /opt/tooling/pyenv/versions/encode-databases/bin/python3.10 cachedir: .pytest_cache rootdir: /code/oss/github/encode/databases plugins: cov-3.0.0, anyio-3.6.2 collected 360 items / 352 deselected / 8 selected tests/test_concurrent_fastapi.py::test_concurrent_tasks[sqlite:///testsuite] PASSED [ 12%] tests/test_concurrent_fastapi.py::test_concurrent_tasks[sqlite+aiosqlite:///testsuite] PASSED [ 25%] tests/test_concurrent_fastapi.py::test_concurrent_tasks[mysql://username:password@localhost:3306/testsuite] PASSED [ 37%] tests/test_concurrent_fastapi.py::test_concurrent_tasks[mysql+aiomysql://username:password@localhost:3306/testsuite] PASSED [ 50%] tests/test_concurrent_fastapi.py::test_concurrent_tasks[mysql+asyncmy://username:password@localhost:3306/testsuite] PASSED [ 62%] tests/test_concurrent_fastapi.py::test_concurrent_tasks[postgresql://username:password@localhost:5432/testsuite] PASSED [ 75%] tests/test_concurrent_fastapi.py::test_concurrent_tasks[postgresql+aiopg://username:password@127.0.0.1:5432/testsuite] PASSED [ 87%] tests/test_concurrent_fastapi.py::test_concurrent_tasks[postgresql+asyncpg://username:password@localhost:5432/testsuite] PASSED [100%] ======================= 8 passed, 352 deselected, 21 warnings in 0.68s ======================= ```
zevisert commented 1 year ago

Rebased to sign commits, all new and old tests passing, switched to module level contextvars to allow for the garbage collector to clean up unreferenced objects.

Edit: And now all linting checks pass too

zanieb commented 1 year ago

Hey! I took some time to poke around at this today. It seems a little wild to use a ContextVar to store a dictionary that includes the asyncio.Task in it (the whole point of them is that they handle task locality for you). I'm not sure what we gain by using a context variable in this case; we can just use an instance dictionary and remove a lot of complexity.

For example, I've implemented this at https://github.com/encode/databases/compare/master...madkinsz:example%2Finstance-safe and it passes the MRE you provided (I only checked SQLite because I didn't want to deal with docker-compose)

Curious for your thoughts

zevisert commented 1 year ago

we can just use an instance dictionary and remove a lot of complexity

I like that take too. Looking back I was a little too committed to getting the ContextVars to work properly. With the constraint of only using context variables in the module scope, and the possibility of multiple database instances with connections to different db's, I needed to introduce the current task back into the context variable to keep things separated correctly. I think that what you're showing with it being stored on the instance and using asyncio.current_task instead of a ContextVar altogether should work just fine!

I'll give it a go with the other databases right now, and in the project were we initially ran into this problem.

zevisert commented 1 year ago

Yeah, this seems great. I've adopted/merged your changes into this MR to accompany the new tests I'm contributing here. I kept a few of the assertions I had added when resolving conflicts - I saw a few of them in the existing code. Let me know what encode's stance is on assert as well as del some_dict[some_key] vs some_dict.pop(some_key) as it seems like we have different styles there too.

Thanks for your help on this, I'm excited for sentry to finally stop notifying us about the errors from this issue!

zanieb commented 1 year ago

Glad you think it makes sense too!

I think my last concern is the event-loop compatibility bit (e.g. trio).

@tomchristie Do you think you'd be willing to give this a look soon or point me to a better reviewer?

zevisert commented 1 year ago

I don't think Databases supports other event loop implementations like trio, curio, etc. I see an old PR #357 that was working on adding anyio support, but it's in draft still.

zevisert commented 1 year ago

Trying not to be demanding your time @tomchristie - I know how open source can be brutal for that.. but if you have a minute it would be great if you or someone else from encode could take a look at this. My team would really like to ship this! Thanks!

ToGoBananas commented 1 year ago

For me, it fixes old concurrent transactions issues. But during tests teardown, I am getting errors like image

Added these changes to test it more https://github.com/ToGoBananas/databases/commit/5150e407d2dc53b7d4302101a84543142b4ebd6b

ToGoBananas commented 1 year ago

Also, this PR brokes code with nested transactions if second transaction in the stack executed via gather

zevisert commented 1 year ago

@ToGoBananas can you provide a reproducible example? Existing tests are all passing for me locally, and here in CI as well. I'm happy to explore, but that screenshot doesn't show what you were trying to do. If you can provide something to cause this error I'll try to correct it and add new tests for it.

ToGoBananas commented 1 year ago
import asyncio

from db import get_database  # just returns global instance of databases.Database (singleton)

async def run_update_from_another_task():
    results = await get_database().fetch_one("update project set name = 'test2' where id = 10 RETURNING id")
    assert results  # we can't update newly created row

async def run():
    await get_database().connect()
    await get_database().execute("delete from project where id = 10")
    async with get_database().transaction():
        await get_database().execute("insert into project(id, name) values(10, 'test')")
        results = await get_database().fetch_one("update project set name = 'test2' where id = 10 RETURNING id")
        assert results  # we can update newly created row
        await asyncio.gather(run_update_from_another_task())

asyncio.run(run())
zevisert commented 1 year ago

Gotcha, thanks for the example - that helps. Here's what I ended up running to test this more, and I have what I think is a solution for you:

@ToGoBananas MVE ```python # file: bananas.py import asyncio import databases import os from contextlib import asynccontextmanager from typing import AsyncIterator db = databases.Database(os.getenv("DATABASE_URL")) @asynccontextmanager async def setup_testbed() -> AsyncIterator[None]: try: await db.connect() await db.execute("CREATE TABLE IF NOT EXISTS bananas (id int, name text)") print(db.url) yield finally: await db.execute("DROP TABLE bananas") await db.disconnect() async def main(): # Wrap with DB setup and teardown async with setup_testbed(): # Make sure the item doesn't exist prior to this test await db.execute("DELETE FROM bananas WHERE id = 1") # Start a transaction in this task async with db.transaction(): # Create a new item await db.execute("INSERT INTO bananas(id, name) values(1, 'test')") # Show that we can update the item in this task await db.execute("UPDATE bananas SET name = 'main' WHERE id = 1") results = await db.fetch_one("SELECT * FROM bananas WHERE id = 1") assert results print("main:", dict(results._mapping.items())) # Try to update the item in a new task await asyncio.gather(run_update_from_another_task()) # See if the child task has a side effect that is perceived here results = await db.fetch_one("SELECT * FROM bananas WHERE id = 1") print("main:", dict(results._mapping.items())) async def run_update_from_another_task(): await db.execute("UPDATE bananas SET name = 'gather' WHERE id = 1") results = await db.fetch_one("SELECT * FROM bananas WHERE id = 1") assert results print("gather:", dict(results._mapping.items())) asyncio.run(main()) ``` A few changes, obviously from what you posted: - Your use of a singleton is no different than just creating a global `databases.Database` in this example - You are using `UPDATE ... RETURNING`, which is a postgres-only feature, but can be implemented across the other databases supported here with a sequential `UPDATE` then `SELECT`. - I added a setup and teardown context manager to `CREATE` the table you are showing, then clean it up when exiting. Using the same `docker-compose` file I posted above, I have the following urls to test with, which I added to my shell then ran this: ```shell > DATABASE_URL="postgresql://username:password@localhost:5432/testsuite" python bananas.py postgresql://username:password@localhost:5432/testsuite main: {'id': 1, 'name': 'main'} Traceback (most recent call last): File "/code/oss/github/encode/databases/bananas.py", line 56, in asyncio.run(main()) File "/opt/tooling/pyenv/versions/3.10.8/lib/python3.10/asyncio/runners.py", line 44, in run return loop.run_until_complete(main) File "/opt/tooling/pyenv/versions/3.10.8/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete return future.result() File "/code/oss/github/encode/databases/bananas.py", line 42, in main await asyncio.gather(run_update_from_another_task()) File "/code/oss/github/encode/databases/bananas.py", line 52, in run_update_from_another_task assert results, f"No results found: {results}" AssertionError: No results found: None ``` Indeed, the example you posted does fail. See the discussion below, but I think this is intended behaviour. If you want to have another task influence an active transaction, you should pass the connection you'd like to use to the other task. Here's the changes I think you should make: ```diff --- bananas.py 2023-05-19 13:09:12.193446155 -0700 +++ bananas.py" 2023-05-19 13:09:00.252633338 -0700 @@ -1,5 +1,6 @@ import asyncio import databases +import databases.core import os from contextlib import asynccontextmanager from typing import AsyncIterator @@ -39,16 +40,16 @@ print("main:", dict(results._mapping.items())) # Try to update the item in a new task - await asyncio.gather(run_update_from_another_task()) + await asyncio.gather(run_update_from_another_task(db.connection())) # See if the child task has a side effect that is perceived here results = await db.fetch_one("SELECT * FROM bananas WHERE id = 1") print("main:", dict(results._mapping.items())) -async def run_update_from_another_task(): - await db.execute("UPDATE bananas SET name = 'gather' WHERE id = 1") - results = await db.fetch_one("SELECT * FROM bananas WHERE id = 1") +async def run_update_from_another_task(connection: databases.core.Connection): + await connection.execute("UPDATE bananas SET name = 'gather' WHERE id = 1") + results = await connection.fetch_one("SELECT * FROM bananas WHERE id = 1") assert results, f"No results found: {results}" print("gather:", dict(results._mapping.items())) ``` Now, by passing the connection that is actively using the transaction you'd like to update values within, you get what I think you were expecting. ```shell > for url in ( > 'sqlite:///testsuite' > 'sqlite+aiosqlite:///testsuite' > 'mysql://username:password@localhost:3306/testsuite' > 'mysql+aiomysql://username:password@localhost:3306/testsuite' > 'mysql+asyncmy://username:password@localhost:3306/testsuite' > 'postgresql://username:password@localhost:5432/testsuite' > 'postgresql+aiopg://username:password@127.0.0.1:5432/testsuite' > 'postgresql+asyncpg://username:password@localhost:5432/testsuite' > ); do DATABASE_URL=$DATABASE_URL python bananas.py; done sqlite:///testsuite main: {'id': 1, 'name': 'main'} gather: {'id': 1, 'name': 'gather'} main: {'id': 1, 'name': 'gather'} sqlite+aiosqlite:///testsuite main: {'id': 1, 'name': 'main'} gather: {'id': 1, 'name': 'gather'} main: {'id': 1, 'name': 'gather'} mysql://username:password@localhost:3306/testsuite main: {'id': 1, 'name': 'main'} gather: {'id': 1, 'name': 'gather'} main: {'id': 1, 'name': 'gather'} mysql+aiomysql://username:password@localhost:3306/testsuite main: {'id': 1, 'name': 'main'} gather: {'id': 1, 'name': 'gather'} main: {'id': 1, 'name': 'gather'} mysql+asyncmy://username:password@localhost:3306/testsuite main: {'id': 1, 'name': 'main'} gather: {'id': 1, 'name': 'gather'} main: {'id': 1, 'name': 'gather'} postgresql://username:password@localhost:5432/testsuite main: {'id': 1, 'name': 'main'} gather: {'id': 1, 'name': 'gather'} main: {'id': 1, 'name': 'gather'} postgresql+aiopg://username:password@127.0.0.1:5432/testsuite main: {'id': 1, 'name': 'main'} gather: {'id': 1, 'name': 'gather'} main: {'id': 1, 'name': 'gather'} postgresql+asyncpg://username:password@localhost:5432/testsuite main: {'id': 1, 'name': 'main'} gather: {'id': 1, 'name': 'gather'} main: {'id': 1, 'name': 'gather'} ``` ---

Based on the documentation for this project, I'd argue that what you're showing here was not intended behavior, and comes as a side effect of the broken concurrency model I'm trying to fix. Here's the line I'm referencing:

Transaction blocks are managed as task-local state. Nested transactions are fully supported, and are implemented using database savepoints.

To me, this means the effects of transactions should be restricted to the task they are running in. Nested transactions, as far as I'm reading it here, just means that you can nest transaction blocks. The documentation does not state that nested transactions are expected to work across tasks. @tomchristie or @Kludex might be able to clarify. In code, I read that to mean that this is supported:

async def example():
    db = databases.Database(os.getenv("DATABASE_URL"))
    async with db.transaction():
        async with db.transaction():
            ...
zanieb commented 1 year ago

Hm it does seem like tasks should inherit the state of the parent tasks if feasible. If run_update_from_another_task was created outside of the transaction but ran during it, I would expect it to fail. If it's created from within the transaction, it probably ought to work.

zevisert commented 1 year ago

it does seem like tasks should inherit the state of the parent tasks if feasible.

That brings us back to context variables then.

If run_update_from_another_task was created outside of the transaction but ran during it, I would expect it to fail. If it's created from within the transaction, it probably ought to work.

Your model of how context works with tasks is slightly off I think. Context is not transferred based on where a coroutine function is defined, it's based on where it is executed. For example, the function signature for asyncio.create_task(coro, *, name=None, context=None) shows that create_task is responsible for defining the context that the child task is run within. The default value of None means the task should inherit a copy of the current context, it's not based on where you defined whatever coroutine is being passed as the coro argument.


Edit: Most of this is out of date. We are now back to ContextVar to support inheritance to child tasks.

I don't mind the switch back to ContextVars, but the consequence is that databases may need to be more clear with what is supported. The documentation for context variables says:

Important: ContextVars should be created at the top module level and never in closures. Context objects hold strong references to context variables which prevents context variables from being properly garbage collected.

I understand "closure" from that warning to also include instances. I can't find much discussion related to that warning on the PR that added that it to the docs or on the implementation itself, but I don't think it's memory safe to add ContextVars as instance attributes.

This is all to say that since ContextVars should be declared at module level but we want them to track instance-level properties (connection and transaction), the immediate consequence I'm foreseeing is that there can only ever be one databases.Database instance per python program. That's because the global context variables that are used to track instance properties natively have no mechanism to keep track of which of possibly many databases.Database instances the connection or transaction belongs to. That circles back to an earlier version of this PR where I had context variables containing dictionaries containing connections or transactions. It's possible to manage all of this, but it was seen as too complex/ugly initially - and even then I had an asyncio.Task as part of the key into that mapping so I'd have to check (not certain) if that actually would work with @ToGoBananas example.

Personally, I think that the majority of databases users are not spawning new tasks within transactions, so I'd like to solve this with documentation. The implementation we have right now is simple and maintainable, and still allows for child tasks to influence a parent transaction if you pass them the connection that's used in the parent's task (like I showed above). FWIW, the good old days of threading did not have this idea of inherited context: threading.Local variables did not pass copies of themselves to child threads and it was up to the programmer to pass in context explicitly.

zanieb commented 1 year ago

Context is not transferred based on where a coroutine function is defined, it's based on where it is executed.

We're definitely on the same page about this; I just wasn't clear in my comment.

I think this pull request would be a breaking change given that example. Gathering asynchronous tasks is pretty common.

Perhaps it's possible to use context variables more cleanly. From my previous comment:

It seems a little wild to use a ContextVar to store a dictionary that includes the asyncio.Task in it (the whole point of them is that they handle task locality for you).

If we exclude the tasks from the context, it might make more sense. I can try taking a look at that implementation again too.

zevisert commented 1 year ago

Cool, glad to hear it. I'll dig in a bit more as well.

I agree that this would be a breaking change as it stands right now. I didn't see that as a big deal since databases has yet to become stable (v0.7 right now), so semantic version says that any minor version change is also to be considered breaking. Considering how subtle / problematic this concurrency bug is that has been effecting my project is, I kind of anticipated a breaking change when I came here to start fixing it. I think that some things that may have been enabled by the previous concurrency model might be removed once we land on an implementation that corrects these concurrency issues that I've been referencing above.

zanieb commented 1 year ago

@zevisert https://github.com/encode/databases/commit/dccb47dc57a18fe00d801d0bcdfbf9150bc0c9ae works for both examples; I converted @ToGoBananas's example into an actual test.

zevisert commented 1 year ago

Summary of the latest changes:

zevisert commented 1 year ago

Earlier I suggested that this PR be considered a breaking change - but with this latest approach, I think that isn't the case anymore. @madkinsz may be more mature in thinking about this than I am though, considering their work on poetry-relax. I read some of the supporting articles there, and generally agree that SemVer can be a bit unwieldy. I think that this PR is now aiming to make no breaking changes, but I can't say for sure that these corrections to concurrency won't break someone's code.

This PR has less impact now than where it had previously gotten to, given that inheritance to child tasks is supported and working correctly now.

zevisert commented 1 year ago

Can you update the documentation to note that these things are task-local but inherited by child tasks?

Please take a look - feel free to edit / suggest further. I'm not sure if these updated examples are too verbose or not

Done, but see the comment below, as I now think another change is warranted before merge.

zevisert commented 1 year ago

Heading to draft for a moment - the MVE in #134 seems really closely related to this issue, but is still failing in the same way as some of the other issues I've been able to resolve with this.

zevisert commented 1 year ago

Okay thanks to @reclosedev's issue #134, I've circled back around to thinking that users probably don't want implicit connection inheritance across tasks, BUT do want transaction inheritance if they explicitly reuse the same connection among descendant tasks.

Their example boils down to this:

import os
import asyncio
import databases

database = databases.Database(os.environ["DB_URL"])

async def worker():
    async with database.transaction():
        await database.execute("SELECT 1 + 1")

async def main():
    await database.connect()
    await database.execute("SELECT 1 + 1")
    await asyncio.gather(*[worker() for _ in range(5)])

asyncio.run(main())

That is:

  1. Instantiate a Database and .connect to set up a connection pool with the database backend. (.connect is and has always been lazy, actual connections are established by database.connection())
  2. Do something that interacts with the database, causing a connection to be created for the parent task.
  3. Spawn a bunch of workers to do independent work, notably not from within a transaction.

In the current version of this PR (and on master) this example is still broken because we are using ContextVars such that the current connection is inherited by all descendant tasks. The descendant tasks cannot do parallel work within transactions, because they all share the same connection - and a single backend connection can only support nested transactions, not concurrent transactions.


This all makes me think that the best solution is for each database instance to have an attribute that tracks connections in used per-task via something like I had been playing with in 8370299b7b8bd4ed791fbdcf2805321ea12aac9f:

class Database:
    _active_connections: WeakKeyDictionary[asyncio.Task, Connection]

This change would make connections task-local with no inheritance. Each new task acquires a new connection from the backend's connection pool unless a connection is explicitly provided to the task. The implicit connection inheritance to descendant tasks is what seems to be the cause of most user frustration with concurrent code. #230 explicitly calls this out.

TransactionBackends, on the other hand, should still be tracked with ContextVariables so that transactions nest given the same connection is used - that's the whole point of Connection._transaction_stack anyway. I roughly suggested this pattern earlier when responding to @ToGoBananas example, as at the time we had no context variables at all so child tasks did not inherit connections, and he was looking for support for the case where a child task needed to influence the state of a transaction in the parent. My suggestion was to explicitly pass the connection with the open transaction to the descendant task. Passing the connection brings the state of any active transactions on that connection along with it.

zevisert commented 1 year ago

Oh and another thing I found just now with the current implementation - context variables undergo a shallow clone when context is copied for new tasks. That means that we had been undoing any isolation that context variables had been providing by mutating the WeakKeyDictionary if it had been created before any descendant tasks started. Simple fix though, just need to treat those dictionaries as immutable.

zevisert commented 1 year ago

Anything more I can do for anyone here to get this reviewed and released?

zanieb commented 1 year ago

Thanks for your patience!

zevisert commented 1 year ago

Thanks for the merge! Could we get @Kludex or @aminalaee to cut a new release so that we can actually have this bug corrected in our codebases? Is there a changelog or blog article I can help write for this? Is there a release cadence or policy that my team can plan from?

zanieb commented 1 year ago

@zevisert as far as I can tell, I can make a release. I do not think anyone else is available to maintain this project.

I'd definitely appreciate if you opened a pull request for a changelog entry at https://github.com/encode/databases/blob/master/CHANGELOG.md

There is no planned release cadence for this project. I'd like to release this change and some other fixes but there is little active development here.