Open bigfootjon opened 2 months ago
Maybe I should slightly tweak this PR then: introduce an async “clean_up_db_connections” decorator and then document that
@bigfootjon I'm not 100% sure what we should say here.
In theory you have one connection per thread, and in the pure async case you have just one thread.
So, why not hold onto the connection? (Likely you'd need a way to recover from connection errors, and such but as a first pass...)
Trouble is, we're still relying on the sync core underneath the async ORM API, so we're still going to be wanting to clean up connections. Without building some test projects I can't say exactly when and what is going to be needed there.
I think that's what makes it interesting 😜
Makes sense. I’ll noodle on this for a bit and see what I can figure out (I only really have time to work on Django in my spare time, and I’ll be a bit limited on spare time for the next few weeks)
@bigfootjon Slowly, slowly is the Django way. No stress! 😉
I've done a bit of tinkering and a bit of thinking and I think I've come up with a solution I'm happy with (though I'm eager to hear competing opinions).
Conceptually, the purpose of cleaning up connections is a necessary piece of the machinery of Django due to how connection health works within Django:
At the beginning of each request, Django closes the connection if it has reached its maximum age. If your database terminates idle connections after some time, you should set CONN_MAX_AGE to a lower value, so that Django doesn’t attempt to use a connection that has been terminated by the database server. (This problem may only affect very low traffic sites.)
Based on my understanding, there are no "timers" within Django to periodically check connection health. Connection health is ensured at the start (and end) of a connection.
So for channels, I think we want to clear connections at the beginning of a request (connect
), whenever a receive
event happens before we dispatch to the consumer, and when the consumer finishes (disconnect
). This provides the experience that users expect: if they use the ORM (and expect a working or fresh database connection) in a consumer it will either be at connect
time or in response to new input from the client (receive
) or the connection will be closing and they won't care.
The other time they would need a database connection would be at send
time. Some event will be triggered on the server and will wake up some logic (channel layer or otherwise) that will result in a send
. In this case, we should not be trying to auto-close connections because the user might be making a series of queries and send
in sequence in response to some server-side event.
As a convoluted example, there might be changes to a server-side "file system" stored in Django Models where each Folder stores a ForeignKey ref to its parent folder. In response to some server-side event, the client would need updates on all folders in a path. As an implementation detail, the server might implement this as (pseudo-python):
folder = ...
while folder is not None:
folder = await Folder.objects.aget(pk=folder.parent_id)
await self.asend(...)
In such a configuration closing the connection after every aget
/asend
pair would kill performance and not be what the user expected. They look at this block of code and think "this is all happening within the same 'request'".
To summarize, connect
and receive
occupy the "request started" side of the conceptual request, but send
does NOT align with the "request finished" event. Only disconnect
aligns with "request finished". Therefore, I think we should close connections before connect
and receive
and after disconnect
in a new Middleware (DjangoDBMiddleware
) or implicitly within the framework (and remove the current database_sync_to_async
stuff).
I think there's a problem with the solution outlined above: connections usually have a bounded lifetime.
If a connection runs for forever the database connection might hit its CONN_MAX_AGE. In these events we should close the connection and make a new one. This would be handled automatically in the above life-cycle events except send
.
Imagine a scenario where a websocket is set up for client-side notifications of mutations to objects stored in the ORM. The developer has configured a consumer that does nothing on connect or receive, but every once in a while it wakes up (via an MQTT notification or something) and sends down the objects that it was notified about. Let's say there's some database query that must be done before sending down the object (like "is this content visible to the current user" checks) but these MQTT notifications are rare (1 hour or more). In this case, if the CONN_MAX_AGE on the database connection is less than an hour then we've violated the expectations of users by keeping their connections open longer than they specified.
The implicit user contract in the above docs is that a connection will not be used after CONN_MAX_AGE. (ok ok I suppose that the max age could be reached during a traditional django request, but since traditional http requests are so fast I'm ignoring that).
I don't really know how to solve this. I can see two solutions:
What if we close connections after the first send
in a "while" (maybe by comparing the time the last send
and the current send
are happening)? This could be close enough to what users expect but could have some problems.
The biggest problem is that if the connection is invalid that wouldn't help users as the first query they make would fail and an exception would be raised, which goes against user expectations ("I thought Django managed this for me"). The next biggest problem is that it has bad performance characteristics. If an user has a consumer that sends 2 send
messages immediately after one another and then pauses for a "while", this solution would slow down the second send
but not the first as django would tear down the connection after the first send and then build a new one before the second send. That would again go against user expectations (and lead to a lot of head scratching).
I don't love this solution tbh.
I think in the send
case we can't provide a good general-purpose solution implicitly for all users. Instead, we should provide the next best solution: documentation.
We can define a function (in channels or django core) that closes connections from async code and then document the contract that users need to uphold for themselves. The thesis of this documentation can be: call close_connection
as early as possible in the code paths that eventually call send
, particularly in long-lived connections.
I'm vastly in favor of (B) and the implicit cleanup in connect
/receive
/disconnect
, and would be interested in putting in the work here to make it so.
Hey @bigfootjon -- thanks for putting your brain to it. Makes sense.
I too lean toward B. If I've got a very long lived connection, I'm happy that it's my job to run a periodic task to clean up the connection object. Makes sense to me that way.
Alright I've made a pass at integrating my ideas into Channels and documenting them!
Hey @bigfootjon. I managed to get a look at this. Seems sound. I want to play with it a bit more but generally positive.
Thinking currently about the migration path here. Say I'm currently using the existing wrapper, I guess we'll want to say "migrate, but the extra calls to close old connections won't hurt"? 🤔
I guess we'll want to say "migrate, but the extra calls to close old connections won't hurt"? 🤔
Yeah I think that’s reasonable. My reading of the close_old_connections
code suggests it’s pretty cheap to call if it does nothing.
In any case, using the async versions of the ORM methods will be slightly faster in certain cases due to the ORM being able to sometimes hit caches before calling in to the sync DB internals, so there’s multiple incentives to use the newer method.
Hey @bigfootjon -- I'm thinking this is a point release, so just working my way round to that. Not forgotten 😉
Hey @bigfootjon — I'm looking at #1091 in relation to this... — problem there is that code wrapped in database_sync_to_async
closes DB connections in test runs.
(Adam has a mocking work around, making it a no-op)
Fancy reading over that? Is it likely something exacerbated by what we're proposing here? It seems so... 🤔 I've had it in my notifications for a couple of weeks, but realise I should mention it to you, rather than sitting there not actually getting to look at it. 🤹
Yeah I can take a look! (Sometime over the next few days)
No stress! 😅
I think we may need to tighten up the story there as part of this progress. Have a look, let me know what you think.
Is it likely something exacerbated by what we're proposing here? It seems so... 🤔
Yeah I think I agree. I think #1091 will need to be resolved before we can move forward with this. My thinking is that since this PR introduces calls to close_old_connections
in more code paths we'll be converting the current rare edge case into a much more common occurence.
I think the best path forward is https://code.djangoproject.com/ticket/30448 (if I understand the ticket correctly), but the channels_db.close_old_connections = no_op
seems like a fine thing to use to short-circuit calling that function at all.
So I see the rough work ahead as (ordered):
channels_db.close_old_connections = no_op
(with a test case adapted from @adamchainz's minimal repro)
a. https://code.djangoproject.com/ticket/30448 (nice to have)And we can close:
Then we could cut a release called something like "4.2: Better Async Django Support" once Django 5.1 is released (which would be needed before releasing #2093 in case anything changes in that API at the last minute before release)
How does this plan sound?
Hey @bigfootjon — yes, that sounds about right!
(I'm head down lining up for DjangoCon Europe, but should have a bit more headspace after that. Do press on! 🎁)
I've put up https://github.com/django/channels/pull/2101 to tackle (1). I don't really have the bandwidth to pick up the upstream django ticket right now, but I'd like to see it done eventually.
Hey @bigfootjon — right, I've come to the conclusion I'm just holding you up here with my lack of bandwidth to push this forwards. 🤹
As such I'm wondering if you'd accept joining as a Channels maintainer, so you can resolve issues, and we can push towards a release? I would be honoured to have you onboard.
Hey @carltongibson, sorry for the slow reply (my American Independence Day celebrations ran a bit long). I'm definitely willing to become a maintainer, but some more info would be helpful to help me commit. I've emailed you at your GitHub account's public email address so we can discuss more directly and not dirty up this GitHub Issue with our discussion!
The async ORM interface was added in 4.1! Since channels supports >=4.2 we can document that new API.
Consider this a first pass, I just updated the spots I noticed that looked like they needed tweaking, happy to take feedback on wordsmithing or different ideas on how to document this.
Closes: #1999