Closed agusmakmun closed 2 months ago
This is a postgresql configuration issue and not related to this project.
max_connections
But the problem is, this case is not happening in Django 3.2, and only appear after migrated to Django 4.
Any different between the versions? I check at the CONN_MAX_AGE
using the same 60s value.
If that is the problem, why don't you report it as such?
show max_connections ;
query to find out. I'm using the django channels for websocket connection, and upgrading to these pacakges.
django==4.2.11 # pyup: < 5.0 # https://www.djangoproject.com/
# Django Async
channels==4.1.0 # https://github.com/django/channels/
channels-redis==4.2.0 # https://github.com/django/channels_redis/
# asgi_redis (async to sync)
asgiref==3.8.1 # https://github.com/django/asgiref/
For max_connections
is 100 as default from Postgres 12.
Did you use async before Django 4?
I'm only using async in the websocket things, and celery tasks to cleanup the sandbox machines.
max_connections
isn't related to your project, so do you use your database for more projects? Increasing max_connections to 400 would be a simple solution
nope, I'm using a single application but setup using k8s.
12 workers as I manually config in compose/production/django/start
exec /usr/local/bin/gunicorn config.asgi --bind 0.0.0.0:5000 --chdir=/app -k uvicorn.workers.UvicornWorker --workers=${UVICORN_TOTAL_OF_WORKERS:=12}
I'm not sure of it, but we're using single pod of celeryworker.
kubectl --kubeconfig=.vscode/development/kubeconfig get pods
NAME READY STATUS RESTARTS AGE
celerybeat-7dccfd8965-mbjt9 2/2 Running 0 4h31m
celeryworker-fc7d59c8c-9bgcv 2/2 Running 0 4h31m
django-5d54d9ddbf-dmpk6 2/2 Running 0 4h31m
flower-78c4b8f74f-8h8hx 2/2 Running 0 4h31m
next-849ff6c88b-95tgn 1/1 Running 0 4h31m
websocket-768b6cd964-gj9j4 2/2 Running 0 4h31m
Yup, we're using uvicorn workers as in above.
Let say if many users are using the websocket at the same time. is increasing the max_connections
to 400 will solve the problem?
How many workers does celery use? 12?
Do the math: you are using two django pods, resulting in at least 24
connections. I assume the same for celery, makes at least 48
in total. You are running beat twice, that will result in double scheduling. Why are you running flower twice? Do next & websockets read the database?
Howere, my guess would be that the catch is in worker-connections I suppose. Each worker has many http connections and each http connection has his own database connection. So with your 24 workers and the default worker-connections of 1000 you need 24000 database connections at max for django alone. Here is some more info: https://stackoverflow.com/questions/63471960/gunicorn-uvicorn-worker-py-how-to-honor-limit-concurrency-setting
How many workers does celery use? 12?
How to check the how many celery workers?
This is what I get so far;
kubectl --kubeconfig=kubeconfig exec --stdin --tty celeryworker-6dc48c545d-p879n -- /bin/bash
Defaulted container "celeryworker" out of: celeryworker, vault-agent, vault-agent-init (init)
django@celeryworker-6dc48c545d-p879n:/app$ source /entrypoint
PostgreSQL is available
django@celeryworker-6dc48c545d-p879n:/app$
django@celeryworker-6dc48c545d-p879n:/app$
django@celeryworker-6dc48c545d-p879n:/app$ celery -A config.celery_app status
-> celery@celeryworker-6dc48c545d-p879n: OK
1 node online.
and for more:
django@celeryworker-6dc48c545d-p879n:/app$ celery -A config.celery_app inspect stats
-> celery@celeryworker-6dc48c545d-p879n: OK
{
"broker": {
"alternates": [],
"connect_timeout": 4,
"failover_strategy": "round-robin",
"heartbeat": 120.0,
"hostname": "redis-master.redis.svc.cluster.local.",
"insist": false,
"login_method": null,
"port": 6379,
"ssl": false,
"transport": "redis",
"transport_options": {},
"uri_prefix": null,
"userid": null,
"virtual_host": "0"
},
"clock": "107892",
"pid": 8,
"pool": {
"implementation": "celery.concurrency.prefork:TaskPool",
"max-concurrency": 8,
"max-tasks-per-child": "N/A",
"processes": [
16,
17,
18,
19,
20,
21,
22,
23
],
"put-guarded-by-semaphore": false,
"timeouts": [
300,
600
],
"writes": {
"all": "0.67, 0.17, 0.17",
"avg": "0.33",
"inqueues": {
"active": 0,
"total": 8
},
"raw": "4, 1, 1",
"strategy": "fair",
"total": 6
}
},
"prefetch_count": 32,
"rusage": {
"idrss": 0,
"inblock": 40,
"isrss": 0,
"ixrss": 0,
"majflt": 3,
"maxrss": 166536,
"minflt": 187490,
"msgrcv": 0,
"msgsnd": 0,
"nivcsw": 833,
"nsignals": 0,
"nswap": 0,
"nvcsw": 1597,
"oublock": 56,
"stime": 1.8547509999999998,
"utime": 10.366434
},
"total": {
"my_project.base.tasks.send_announcement_broadcast": 1,
"my_project.webssh.tasks.cleanup_expired_durations": 1,
"my_project.webssh.tasks.cleanup_machines": 3,
"my_project.webssh.tasks.sync_cloud_drive_account": 1
},
"uptime": 533
}
1 node online.
"max-concurrency": 8
anything that I can do apart from increasing the max_connections
?
You have at least these 4 options as far I can tell:
UvicornWorker
like described in https://stackoverflow.com/questions/63471960/gunicorn-uvicorn-worker-py-how-to-honor-limit-concurrency-setting by implementing a CustomUvicornWorker
Thank you so much @foarsitter, we will try your suggestions :+1:
If you resolve your issue by implementing a custom UvicornWorker we would like to receive a PR :)
Sure, will happy to do that. For quick solution we will try to increase the postgres max_connections
to 1000 first.
Finally my issue was solved. This happens because of using both shared_task()
and @celery_app.task()
, and it leads duplicate database connection. So, after rollback all to use @celery_app.task()
only, it's resolve the problem.
Note: I changed it to
@shared_task()
because trying to follow theusers/tasks.py
for non-executable tasks for cronjob.
myapp/tasks.py
from celery import shared_task
from config import celery_app
@shared_task() def task_1(): ...
@celery_app.task() def task_2_that_called_for_cronjob(): ...
### After
```python
from config import celery_app
@celery_app.task()
def task_1():
...
@celery_app.task()
def task_2_that_called_for_cronjob():
...
The issue still appears again.
shared_task
is just another way of registering tasks for situations when there is no app available yet. So that can't be the solution.
I found some related issue are because of impact of upgrade Django 4 + using sync_to_async
, as I'm right now using it to handle the async process for the websocket.
This may be due to Django 4.0 having per-request contexts for the thread sensitivity of sync_to_async() — See #32889. If so, that's kind-of a good thing, in that too many open resources is what you'd expect in async code, and up to now, we've not been hitting that, as we've essentially been running serially. - https://code.djangoproject.com/ticket/33497#comment:4
So you put CONN_MAX_AGE to 0 and solved the issue?
Nope, I had changed it to 0. And the error still occurs.
The issue finally resolved after implementing the django-db-connection-pool + set the CONN_MAX_AGE=0
settings.py
DATABASES = {"default": env.db("DATABASE_URL")}
DATABASES["default"]["ATOMIC_REQUESTS"] = True
# https://github.com/altairbow/django-db-connection-pool?tab=readme-ov-file#postgresql
DATABASES["default"]["ENGINE"] = "dj_db_conn_pool.backends.postgresql"
DATABASES["default"]["POOL_OPTIONS"] = {
"POOL_SIZE": 10,
"MAX_OVERFLOW": 10,
"RECYCLE": 24 * 60 * 60,
}
# Set to 0 to disable long(er) living connections
# https://docs.djangoproject.com/en/dev/ref/settings/#std-setting-CONN_MAX_AGE
# https://github.com/cookiecutter/cookiecutter-django/issues/5071
# "connection failed: FATAL: remaining connection slots are reserved for non-replication superuser connections" # noqa: ERA001
DATABASES["default"]["CONN_MAX_AGE"] = env.int("CONN_MAX_AGE", default=0)
requirements.txt
psycopg[pool]==3.1.18 # https://github.com/psycopg/psycopg
django-db-connection-pool[psycopg3]==1.2.5 # https://github.com/altairbow/django-db-connection-pool
The upcoming errors that occurs is ProgrammingError the last operation didn't produce a result
. \
Even after changing the ORM of .exists()
to .count() > 0
.
Related issue: https://github.com/psycopg/psycopg/discussions/417
@agusmakmun is CONN_MAX_AGE
required when a pool is involved? That shouldn't have to be the case I suppose.
For this case, was resolved after downgrade the python fromdocker.io/python:3.12.3-slim-bookworm
topython:3.10.10-slim-bullseye
.
I also consulate with ChatGPT with this query:
For a database server with 2 cores and 2GB of memory, how many POOL_SIZE, MAX_OVERFLOW and RECYCLE. I need numbers only.
DATABASES["default"]["ENGINE"] = "dj_db_conn_pool.backends.postgresql"
DATABASES["default"]["POOL_OPTIONS"] = {
"POOL_SIZE": 3,
"MAX_OVERFLOW": 3,
"RECYCLE": 1800, # 30 minutes
}
DATABASES["default"]["CONN_MAX_AGE"] = env.int("CONN_MAX_AGE", default=0)
Until deployed to production, no issues so far. Not sure which one is correct, either decrease the pool_size with propper size, or downgrade the python.
You can certain remove CONN_MAX_AGE since it defaults to 0.
The issue finally resolved after implementing the django-db-connection-pool + set the
CONN_MAX_AGE=0
settings.py
DATABASES = {"default": env.db("DATABASE_URL")} DATABASES["default"]["ATOMIC_REQUESTS"] = True # https://github.com/altairbow/django-db-connection-pool?tab=readme-ov-file#postgresql DATABASES["default"]["ENGINE"] = "dj_db_conn_pool.backends.postgresql" DATABASES["default"]["POOL_OPTIONS"] = { "POOL_SIZE": 10, "MAX_OVERFLOW": 10, "RECYCLE": 24 * 60 * 60, } # Set to 0 to disable long(er) living connections # https://docs.djangoproject.com/en/dev/ref/settings/#std-setting-CONN_MAX_AGE # https://github.com/cookiecutter/cookiecutter-django/issues/5071 # "connection failed: FATAL: remaining connection slots are reserved for non-replication superuser connections" # noqa: ERA001 DATABASES["default"]["CONN_MAX_AGE"] = env.int("CONN_MAX_AGE", default=0)
requirements.txt
psycopg[pool]==3.1.18 # https://github.com/psycopg/psycopg django-db-connection-pool[psycopg3]==1.2.5 # https://github.com/altairbow/django-db-connection-pool
Thank you so much.. I was suffering from this issue.
Guys, FYI, until now I still facing this issue some how.
And you did increase max_connections
to 1000?
Guys, FYI, our issue finally resolved after increasing the PostgreSQL database from default 100 to max_connections=300
.
and change the DATABASES["default"]["CONN_MAX_AGE"] = env.int("CONN_MAX_AGE", default=0)
also remove the additional database pool setup to use default one.
My last issue is because of max_connections
not set properly.
What happened?
I'm having this postgresql error when doing upgrade the django cookie cutter for my project.
I'm using many celery tasks, websocket connections, and deploy using the k8s in my project.
The similar issues are mentioned in Stack Overflow:
One of the thread mentions this package, but not sure if it's working or not: https://github.com/altairbow/django-db-connection-pool
Details
python3 -V
: 3.12docker --version
: 3docker compose --version
: 3