Closed andrewgodwin closed 6 years ago
@michaelhays If you want to verify this would fix it, you can try calling close_old_connections
yourself at the end of your consumer's method (but likely not in disconnect, as https://github.com/django/daphne/issues/152 will prevent that). I'll look at solving both issues later in the week.
close_old_connections
didn't work for me as I used it. I tried it in a couple of places.
End of the consumer method:
# consumers.py
class ChatbotConsumer(JsonWebsocketConsumer):
def connect(self):
self.accept()
def receive_json(self, content):
# create Chat in scope if one doesn't exist
if 'chat' not in self.scope:
chat = Chat.objects.create_chat(self.scope['user'])
self.scope['chat'] = chat
# process chatbot response, store messages, and send response to client
response_data = some_func(content['message'])
self.scope['chat'].add_message(content['message'])
self.scope['chat'].add_message(response_data)
self.send_json(response_data)
close_old_connections()
End of the model method:
# models.py
class Chat(models.Model):
body = JSONField()
...
def add_message(self, message):
self.body.append(message)
self.save()
close_old_connections()
Setting CONN_MAX_AGE
to 0 has helped a bit, in that certain actions like reloading the page removes 2 or 3 of the "COMMIT" idle connections (but, only sometimes?), and occasionally, the idle connections are reaped. But I've yet to notice any rhyme or reason to it.
Probably need the startup method connected as well. I'll look into it more tomorrow when I have time to work on Channels!
OK, I have replicated this, and the temporary workaround is calling django.db.close_old_connections
right after you do an ORM call, inside the same sync context (calling it anywhere else hides the connections due to thread safety).
Working on a proper fix now.
OK, this should be fixed in 6afd423 - any code inside of a SyncConsumer
will work fine, and anything you're calling from an async context should use the new channels.db.database_sync_to_async
(can either be called with a function or be used as a function/method decorator). You'll also need the latest release of asgiref
(2.1.4) to make method decorators work correctly.
If someone could confirm this fixes it for them, I can close this.
Confirmed to fix the issue for me!
Great! I'll put this out in a release in the next day or two, depending on what else I land.
Hey, @andrewgodwin ! I'm steel having the same problem. And it's not only about consumer. I was following your tutorial (multichat). I have database_sync_to_async decorator that covers get_room_or_error in chat.utils. The main thing is that every page refresh (not only chat page) creates new database connection! Django + PostgreSQL + Daphne + Heroku
If you can reduce it to a simple set of code that reliably reproduces the problem, using the latest versions of all of the repositories, then please open a new issue with that code snippet, and steps to replicate locally (I can't debug things that only happen in Heroku)
It's exact same issue as @michaelhays described at django/daphne#163. In my case there is no such issue in local but it is in Heroku. And I understand that you can't debug things that only happen in Heroku... But there's no support where I can text with that issue. You have more influence to report them about it as more people who updating from channels 1.0 to 2.0 will suffer this problem...
I just recreated everything on your clear multichat example here and deployed to heroku https://django-channels-2-test.herokuapp.com/ (user: demo, pass: demo) and get same issue. Refresh page 20 times and you'll get
OperationalError at /
FATAL: too many connections for role "jdknihvnjyczlm"
Can you confirm what exact versions of each channels package you're running using a pip freeze
from the Heroku environment? I just tried this again locally with multichat
and verified that it only ever opens a single database connection.
aioredis==1.0.0
asgiref==2.1.6
async-timeout==2.0.0
attrs==17.4.0
autobahn==17.10.1
Automat==0.6.0
channels==2.0.2
channels-redis==2.1.0
constantly==15.1.0
daphne==2.0.4
dj-database-url==0.4.2
Django==2.0.2
hiredis==0.2.0
hyperlink==18.0.0
idna==2.6
incremental==17.5.0
msgpack==0.5.6
psycopg2==2.7.4
pytz==2018.3
six==1.11.0
Twisted==17.9.0
txaio==2.8.2
whitenoise==3.3.1
zope.interface==4.4.3
Looks like packages in heroku are exactly the same as in local env... And yes, the problem only appears at Heroku, works fine at local though...
Unless I can replicate it locally I'm afraid I can't fix it, sorry - I'm not sure what's different about the Heroku environment that would cause this.
Hmmmm
Same issue, Django + PostgreSQL + Daphne + Heroku.
Originally had this issue with the consumers, but was fixed with the prior release.
On any DB lookup within the Django app, postgres connections are left idle and never close. This occurs on any DB lookup, not the very narrow area where I use channels.
Relevant versions: asgiref==2.2.0 channels==2.0.2 channels-redis==2.1.0 daphne==2.1.0 dj-database-url==0.5.0 Django==2.0 psycopg2==2.7.4
One additional item, I verified that these idle connections don't close after the conn_max_age 500 second have elapsed.
db_from_env = dj_database_url.config(conn_max_age=500) DATABASES['default'].update(db_from_env)
UPDATE: Perhaps this article provides a clue: https://devcenter.heroku.com/articles/forked-pg-connections
what happens if you drop conn_max_age
from the dj_database_url
call?
I was experiencing the exact same problem @burcaw described. I can confirm @kennethreitz, dropping conn_max_age
from the dj_database_url.config
call has resolved the issue for me. No more idle db connections are left open π.
β¨π°β¨
aioredis==1.0.0
asgi-redis==1.3.0
asgiref==2.3.0
async-timeout==2.0.1
attrs==17.4.0
autobahn==18.4.1
Automat==0.6.0
channels==2.1.1
channels-redis==2.1.1
daphne==2.1.1
Django==1.11
psycopg2==2.7.4
six==1.11.0
Twisted==17.9.0
txaio==2.9.0
txredisapi==1.4.4
ujson==1.35
uWSGI==2.0.15
zope.interface==4.4.3
Hello, I have encountered the same issue as @burcaw , Nginx + Uwsgi(WSGI Layer) + Daphne (only ASGI Layer), I have already updated all packages to latest ones, but even when I dropped CONN_MAX_AGE
, it creates and keeps idle connections to database, without terminating them, so if few clients connect to websockets, postgres reaches the max connection limit.
Is there any workaround for current issue?
Thanks in advance
If you are still having the issue it means you are using code that talks to a database outside of database_sync_to_async
. Make sure all your database access is wrapped in that (consumer methods on sync consumers already are).
Thanks for your reply, I have double checked my code and didn`t found any database access in methods without database_sync_to_async decorator.
So the problem was in my server configuration and ASGI_THREADS
env var:
(os.cpu_count() or 1) * 5
) and the test server a was running on os.cpu_count()
was 16
, so max worker count for each daphne server were 80
, so that`s why I had so many connections to database.ASGI_THREADS
and "CONN_MAX_AGE": 60
works fine now, each worker reuses connections to database and cleans everything well, on my current settings ASGI_THREADS=20
and 4 daphne instances on different ports, number of database connections doesn`t exceed more than 45, while load testing them using artillery.ioP.S Maybe it can be usefull to add some info about ASGI_THREADS
var to documentation
Also one more question, while load testing with an external tool, sometimes I get
ERROR Exception in callback AsyncioSelectorReactor.callLater.<locals>.run() at /venv/lib/python3.5/site-packages/twisted/internet/asyncioreactor.py:287
handle: <TimerHandle when=1713807.003341125 AsyncioSelectorReactor.callLater.<locals>.run() at /venv/lib/python3.5/site-packages/twisted/internet/asyncioreactor.py:287>
Traceback (most recent call last):
File "/usr/lib/python3.5/asyncio/events.py", line 126, in _run
self._callback(*self._args)
File "/venv/lib/python3.5/site-packages/twisted/internet/asyncioreactor.py", line 290, in run
f(*args, **kwargs)
File "/venv/lib/python3.5/site-packages/twisted/internet/tcp.py", line 289, in connectionLost
protocol.connectionLost(reason)
File "/venv/lib/python3.5/site-packages/autobahn/twisted/websocket.py", line 128, in connectionLost
self._connectionLost(reason)
File "/venv/lib/python3.5/site-packages/autobahn/websocket/protocol.py", line 2467, in _connectionLost
WebSocketProtocol._connectionLost(self, reason)
File "/venv/lib/python3.5/site-packages/autobahn/websocket/protocol.py", line 1096, in _connectionLost
self._onClose(self.wasClean, WebSocketProtocol.CLOSE_STATUS_CODE_ABNORMAL_CLOSE, "connection was closed uncleanly (%s)" % self.wasNotCleanReason)
File "/venv/lib/python3.5/site-packages/autobahn/twisted/websocket.py", line 171, in _onClose
self.onClose(wasClean, code, reason)
File "/venv/lib/python3.5/site-packages/daphne/ws_protocol.py", line 146, in onClose
self.application_queue.put_nowait({
AttributeError: 'WebSocketProtocol' object has no attribute 'application_queue'
Is it a channels issue or just some problem in testing tool, that it doesn`t close the connections properly to ws?
Ah, yes. I will add notes about the number of possible threads to the documentation.
As for that error, it looks like it may be a bug in Daphne - check you are running the most recent version, and if you are, please open an issue on the Daphne issue tracker with that traceback and any more information you have.
Ok, no problem, I am using latest release version of daphne other channels related packages, currently looks like, this error appears, when there are more connections than daphne server cluster can handle, current setup is 10 daphne instances, each set to ASGI_THREADS=8
, I ll create an issue on daphne issue tracker with that error code. Currently, my setup can handle 10 new connections per second, using AsyncConsumers, if I set more, that error starts to appear.
Hi guys, catching up on the issue. Does this mean there is no way of running an API on the same server as Channels ? Or would that mean that I would have to wrap all my api calls in @database_sync_to_async ? I'm running on Heroku and my consumer (AsyncWebsocketConsumer) is using the decorator
EDIT : Looks solved. Removing CONN_MAX_AGE apparently did the trick. https://stackoverflow.com/questions/48644208/django-postgresql-heroku-operational-error-fatal-too-many-connections-for-r
I don't have a conn_max_age
set on my dj_database_url()
call and yet I still see lots of idle un-reaped Postgress connections.
Any ideas on where to look?
Relavant Versions:
channels-redis==2.3.3
channels==2.1.7
django==2.1.4
dj-database-url==0.5.0
dropping conn_max_age still didn't fix the issue for me, I am having this issue for rest https requests under daphne, not the websocket connections. I don't have this issue under wsgi. Would I have to set up a new server for the websocket connections and stay with wsgi for the rest framework?
channels==2.1.7 channels-redis==2.3.3 Django==2.0.6 dj-database-url==0.5.0
nvm my mistake, I was setting MAX_CONN_AGE through heroku settings: django_heroku.settings(locals()) which has been mentioned in https://github.com/heroku/django-heroku/issues/11
For me, the only thing that worked was setting ASGI_THREADS
(an an env var) to a very low number (6). Daphne never closes these connections by the way. I'm not sure if this is the expected behaviour, @andrewgodwin?
Daphne and Channels both don't have enough visibility into the threadpool system to close any connections in there, sadly. This is something that has to be remedied in Django itself, which I am in the process of.
I removed conn_max_age
, tried with close_old_connections
, as well as setting ASGI_THREADS
to an even lower number (3) and the issue is still here for me for both rest requests and websockets.
channels-redis==2.3.3
channels==2.1.7
dj-database-url==0.5.0
Django==2.1.7
Any update with this issue? @andrewgodwin
@LgArcida It's a big project. The first steps are made here: https://github.com/django/deps/blob/master/draft/0009-async.rst
For those who keep finding this issue, here's what I did:
I added close_old_connections
calls here and there (I had some custom authentication code where I was authenticating using DRF tokens). This helped but the problem still popped up immediately.
I started using pgbouncer (which I was planning on doing anyway) - https://devcenter.heroku.com/articles/python-concurrency-and-database-connections#limit-connections-with-pgbouncer
I then was able to send 1,000 messages through my channels code on Heroku and it all worked fine (previously I would have issues after 20-30 messages).
I have to say I'm still a little confused by this issue, even though setting ASGI_THREADS
to a low number has "solved" the issue for me. I'm using a synchronous consumer.
Where am I supposed to use close_old_connections()
? Here's my workflow with channels:
Where does close_old_connections()
go here?
I was having this problem with channels and heroku postgres but it seems to have resolved itself when I changed my runtime from 3.6 to 3.8.1. I don't need to use close_old_connections()
@Paulcappaert I donβt suppose we have any idea what changed there do we?
@carltongibson No idea, all I know is that I made that single change and the problem went away.
Hmmm. Yes... π Thanks for the follow-up.
(asyncio is changing rapidly, still, but π€·ββοΈ)
@carltongibson No idea, all I know is that I made that single change and the problem went away.
which change do you perform so problem went away ?
@carltongibson No idea, all I know is that I made that single change and the problem went away.
which change do you perform so problem went away ?
I updated the runtime to python 3.8.1
@carltongibson No idea, all I know is that I made that single change and the problem went away.
which change do you perform so problem went away ?
I updated the runtime to python 3.8.1
ok thanks, i also updated runtime to python 3.8.2 and done
I was having this problem with channels and heroku postgres but it seems to have resolved itself when I changed my runtime from 3.6 to 3.8.1. I don't need to use close_old_connections()
You now what? i was struggling from last two days on this stupid error. I am glad you fixed it. Thanks a lot. I am really happy
I'm happy to see @Paulcappaert's suggestion to switch runtimes helped some here, unfortunately for some of us that's impossible as we may be using other libraries that don't support 3.8 yet. I for one rely on Celery for background processing and cannot really update.
For what it's worth, merely getting off of 3.6 may be enough - asyncio was a little underbaked in 3.6, and is much better in 3.7, and so most issues just need 3.7, not 3.8.
For what it's worth, merely getting off of 3.6 may be enough - asyncio was a little underbaked in 3.6, and is much better in 3.7, and so most issues just need 3.7, not 3.8.
OK, just a heads up that from my experience - that's not true. I was (and still am) using Channels 1, and when I upgraded to Python 3.7.2 to enjoy a Celery feature, I also had to upgrade to Channels 2 (all sorts of dependency clashes). That's when I encountered this issue again, and reverted my whole setup so Channels 2 didn't kill my server. I am hoping to give this upgrade another chance soon enough, after multiple failed attempts.
I'm hosted at Heroku; YMMV.
Channels 2 versus Channels 1 is a very different conversation given that they use completely different asynchronous styles under the hood.
Generally, this issue is caused by Django not identifying the connections with threads or coroutines that are already-dead. For Django itself this was fixed in the latest asgiref
release, but Channels does not use the Local
implementation in there so it doesn't get the same fixes. I also feel like Heroku does something to the Python environment based on a whole slew of Heroku-specific bugs that I used to get, but I have no time to verify that, unfortunately.
Channels 2 versus Channels 1 is a very different conversation given that they use completely different asynchronous styles under the hood.
Generally, this issue is caused by Django not identifying the connections with threads or coroutines that are already-dead. For Django itself this was fixed in the latest
asgiref
release, but Channels does not use theLocal
implementation in there so it doesn't get the same fixes. I also feel like Heroku does something to the Python environment based on a whole slew of Heroku-specific bugs that I used to get, but I have no time to verify that, unfortunately.
Let me clarify. Maybe I shouldn't have mentioned Channels 1 at all. What I was talking about is Channels 2 on Python 3.7.2, it's why I had to revert back to Channels 1, which is working perfectly fine.
I'm using Python 3.7.6 on heroku. My main server runs fine on with guinicorn. I observed this issue when I run daphne on my dev server.
On my dev server, I incorporated Channels 2 + Websocket and daphne into my app and saw this issue whenever I refreshed my page. I changed my app to use AJAX instead and still see this issue. I only have one view function that return JSONresponse for the AJAX call. That AJAX view function was decorated with 'login_required' and a database update. So i did the following:
Using AJAX + daphne:
Using websocket(Channels2) + daphne
I looked at the heroku logs and found that in all instances, the fatal occurred when Django was doing session validation. I don't observe this behavior on my main server when I refresh the login_required view 20+ times. My main server uses guinicorn.
File "/app/.heroku/python/lib/python3.7/site-packages/django/contrib/auth/__init__.py", line 58, in _get_user_session_key
return get_user_model()._meta.pk.to_python(request.session[SESSION_KEY])
File "/app/.heroku/python/lib/python3.7/site-packages/django/contrib/sessions/backends/base.py", line 64, in __getitem__
return self._session[key]
I believe that Django session management is causing this issue when I use daphne. When either Django View or a WebSocket consummer requires validation, a connection is opened to the DATABASE but it never closes. This behavior is not apparent on the local machine because we don't have the limitation on database connection.
Hi. Are you able to reproduce on a laptop, or only on Heroku? (That still seems to be the issue...)
Initially reported in https://github.com/django/daphne/issues/163
I suspect this is due to the lack of calling close_old_connections, as seen in Channels 1 here: https://github.com/django/channels/blob/1.x/channels/signals.py#L10