PostgreSQL - connection failed: FATAL: remaining connection slots are reserved for non-replication superuser connections

agusmakmun commented 3 months ago

What happened?

I'm having this postgresql error when doing upgrade the django cookie cutter for my project.

connection failed: FATAL: remaining connection slots are reserved for non-replication superuser connections

I'm using many celery tasks, websocket connections, and deploy using the k8s in my project.

The similar issues are mentioned in Stack Overflow:

One of the thread mentions this package, but not sure if it's working or not: https://github.com/altairbow/django-db-connection-pool

Details

Python version, run python3 -V: 3.12
Docker version (if using Docker), run docker --version: 3
docker compose version (if using Docker), run docker compose --version: 3
Cookie cutter Django: 2024.04.22

(env-my-labs) ➜  env-my-labs cookiecutter https://github.com/cookiecutter/cookiecutter-django
  [1/27] project_name (My Awesome Project): My Project Labs
  [2/27] project_slug (practical_devsecops_cloud_labs):
  [3/27] description (Behold My Awesome Project!): My Project Labs
  [4/27] author_name (Daniel Roy Greenfeld): My Project Developer
  [5/27] domain_name (example.com): my-domain.com
  [6/27] email (practical-devsecops-developer@my-domain.com): info@my-domain.com
  [7/27] version (0.1.0): 3.7.0
  [8/27] Select open_source_license
    1 - MIT
    2 - BSD
    3 - GPLv3
    4 - Apache Software License 2.0
    5 - Not open source
    Choose from [1/2/3/4/5] (1): 5
  [9/27] Select username_type
    1 - username
    2 - email
    Choose from [1/2] (1): 1
  [10/27] timezone (UTC):
  [11/27] windows (n): n
  [12/27] Select editor
    1 - None
    2 - PyCharm
    3 - VS Code
    Choose from [1/2/3] (1): 3
  [13/27] use_docker (n): y
  [14/27] Select postgresql_version
    1 - 16
    2 - 15
    3 - 14
    4 - 13
    5 - 12
    Choose from [1/2/3/4/5] (1): 5
  [15/27] Select cloud_provider
    1 - AWS
    2 - GCP
    3 - Azure
    4 - None
    Choose from [1/2/3/4] (1): 1
  [16/27] Select mail_service
    1 - Mailgun
    2 - Amazon SES
    3 - Mailjet
    4 - Mandrill
    5 - Postmark
    6 - Sendgrid
    7 - SendinBlue
    8 - SparkPost
    9 - Other SMTP
    Choose from [1/2/3/4/5/6/7/8/9] (1): 5
  [17/27] use_async (n): y
  [18/27] use_drf (n): y
  [19/27] Select frontend_pipeline
    1 - None
    2 - Django Compressor
    3 - Gulp
    4 - Webpack
    Choose from [1/2/3/4] (1): 1
  [20/27] use_celery (n): y
  [21/27] use_mailpit (n): y
  [22/27] use_sentry (n): y
  [23/27] use_whitenoise (n): n
  [24/27] use_heroku (n): n
  [25/27] Select ci_tool
    1 - None
    2 - Travis
    3 - Gitlab
    4 - Github
    5 - Drone
    Choose from [1/2/3/4/5] (1): 3
  [26/27] keep_local_envs_in_vcs (y): y
  [27/27] debug (n): n
 [SUCCESS]: Project initialized, keep up the good work!

foarsitter commented 3 months ago

This is a postgresql configuration issue and not related to this project.

You are running out of connections because they aren't closed after usage
Use https://pgtune.leopard.in.ua/ to tune your config, make sure you have a high value for max_connections
Set https://docs.djangoproject.com/en/5.0/ref/settings/#std-setting-CONN_MAX_AGE to 0 to disable long(er) living connections

agusmakmun commented 3 months ago

But the problem is, this case is not happening in Django 3.2, and only appear after migrated to Django 4. Any different between the versions? I check at the CONN_MAX_AGE using the same 60s value.

foarsitter commented 3 months ago

If that is the problem, why don't you report it as such?

What else did you change during the migration?
What lib are you using for websockets, since this template doesn't have them?
what is your max connections? Run the show max_connections ; query to find out.
Did you use async before Django 4?

agusmakmun commented 3 months ago

I'm using the django channels for websocket connection, and upgrading to these pacakges.

django==4.2.11  # pyup: < 5.0  # https://www.djangoproject.com/

# Django Async
channels==4.1.0  # https://github.com/django/channels/
channels-redis==4.2.0  # https://github.com/django/channels_redis/

# asgi_redis (async to sync)
asgiref==3.8.1  # https://github.com/django/asgiref/

For max_connections is 100 as default from Postgres 12.

Did you use async before Django 4?

I'm only using async in the websocket things, and celery tasks to cleanup the sandbox machines.

foarsitter commented 3 months ago

max_connections isn't related to your project, so do you use your database for more projects?
how many gunicorn workers do you use?
how many celeryworkers workers do you use?
is you use the async/unvicon option you are using the uvicon worker, I'm correct?

Increasing max_connections to 400 would be a simple solution

agusmakmun commented 3 months ago

nope, I'm using a single application but setup using k8s.

12 workers as I manually config in compose/production/django/start

exec /usr/local/bin/gunicorn config.asgi --bind 0.0.0.0:5000 --chdir=/app -k uvicorn.workers.UvicornWorker --workers=${UVICORN_TOTAL_OF_WORKERS:=12}

I'm not sure of it, but we're using single pod of celeryworker.

kubectl --kubeconfig=.vscode/development/kubeconfig get pods
NAME                           READY   STATUS    RESTARTS   AGE
celerybeat-7dccfd8965-mbjt9    2/2     Running   0          4h31m
celeryworker-fc7d59c8c-9bgcv   2/2     Running   0          4h31m
django-5d54d9ddbf-dmpk6        2/2     Running   0          4h31m
flower-78c4b8f74f-8h8hx        2/2     Running   0          4h31m
next-849ff6c88b-95tgn          1/1     Running   0          4h31m
websocket-768b6cd964-gj9j4     2/2     Running   0          4h31m

Yup, we're using uvicorn workers as in above.

agusmakmun commented 3 months ago

Let say if many users are using the websocket at the same time. is increasing the max_connections to 400 will solve the problem?

foarsitter commented 3 months ago

How many workers does celery use? 12?

Do the math: you are using two django pods, resulting in at least 24 connections. I assume the same for celery, makes at least 48 in total. You are running beat twice, that will result in double scheduling. Why are you running flower twice? Do next & websockets read the database?

Howere, my guess would be that the catch is in worker-connections I suppose. Each worker has many http connections and each http connection has his own database connection. So with your 24 workers and the default worker-connections of 1000 you need 24000 database connections at max for django alone. Here is some more info: https://stackoverflow.com/questions/63471960/gunicorn-uvicorn-worker-py-how-to-honor-limit-concurrency-setting

agusmakmun commented 3 months ago

How many workers does celery use? 12?

How to check the how many celery workers?

This is what I get so far;

kubectl --kubeconfig=kubeconfig exec --stdin --tty celeryworker-6dc48c545d-p879n -- /bin/bash

Defaulted container "celeryworker" out of: celeryworker, vault-agent, vault-agent-init (init)
django@celeryworker-6dc48c545d-p879n:/app$ source /entrypoint
PostgreSQL is available
django@celeryworker-6dc48c545d-p879n:/app$
django@celeryworker-6dc48c545d-p879n:/app$
django@celeryworker-6dc48c545d-p879n:/app$ celery -A config.celery_app status
->  celery@celeryworker-6dc48c545d-p879n: OK

1 node online.

and for more:

django@celeryworker-6dc48c545d-p879n:/app$ celery -A config.celery_app inspect stats
->  celery@celeryworker-6dc48c545d-p879n: OK
    {
        "broker": {
            "alternates": [],
            "connect_timeout": 4,
            "failover_strategy": "round-robin",
            "heartbeat": 120.0,
            "hostname": "redis-master.redis.svc.cluster.local.",
            "insist": false,
            "login_method": null,
            "port": 6379,
            "ssl": false,
            "transport": "redis",
            "transport_options": {},
            "uri_prefix": null,
            "userid": null,
            "virtual_host": "0"
        },
        "clock": "107892",
        "pid": 8,
        "pool": {
            "implementation": "celery.concurrency.prefork:TaskPool",
            "max-concurrency": 8,
            "max-tasks-per-child": "N/A",
            "processes": [
                16,
                17,
                18,
                19,
                20,
                21,
                22,
                23
            ],
            "put-guarded-by-semaphore": false,
            "timeouts": [
                300,
                600
            ],
            "writes": {
                "all": "0.67, 0.17, 0.17",
                "avg": "0.33",
                "inqueues": {
                    "active": 0,
                    "total": 8
                },
                "raw": "4, 1, 1",
                "strategy": "fair",
                "total": 6
            }
        },
        "prefetch_count": 32,
        "rusage": {
            "idrss": 0,
            "inblock": 40,
            "isrss": 0,
            "ixrss": 0,
            "majflt": 3,
            "maxrss": 166536,
            "minflt": 187490,
            "msgrcv": 0,
            "msgsnd": 0,
            "nivcsw": 833,
            "nsignals": 0,
            "nswap": 0,
            "nvcsw": 1597,
            "oublock": 56,
            "stime": 1.8547509999999998,
            "utime": 10.366434
        },
        "total": {
            "my_project.base.tasks.send_announcement_broadcast": 1,
            "my_project.webssh.tasks.cleanup_expired_durations": 1,
            "my_project.webssh.tasks.cleanup_machines": 3,
            "my_project.webssh.tasks.sync_cloud_drive_account": 1
        },
        "uptime": 533
    }

1 node online.

foarsitter commented 3 months ago

"max-concurrency": 8

agusmakmun commented 3 months ago

anything that I can do apart from increasing the max_connections?

foarsitter commented 3 months ago

You have at least these 4 options as far I can tell:

Limit the amount of concurrent connections for the UvicornWorker like described in https://stackoverflow.com/questions/63471960/gunicorn-uvicorn-worker-py-how-to-honor-limit-concurrency-setting by implementing a CustomUvicornWorker
Set https://docs.djangoproject.com/en/5.0/ref/settings/#std-setting-CONN_MAX_AGE to 0
Increase max_connections
use pgbouncer or pgpool for connection pooling

agusmakmun commented 3 months ago

Thank you so much @foarsitter, we will try your suggestions :+1:

foarsitter commented 3 months ago

If you resolve your issue by implementing a custom UvicornWorker we would like to receive a PR :)

agusmakmun commented 3 months ago

Sure, will happy to do that. For quick solution we will try to increase the postgres max_connections to 1000 first.

agusmakmun commented 3 months ago

Finally my issue was solved. This happens because of using both shared_task() and @celery_app.task(), and it leads duplicate database connection. So, after rollback all to use @celery_app.task() only, it's resolve the problem.

Note: I changed it to @shared_task() because trying to follow the users/tasks.py for non-executable tasks for cronjob.

https://github.com/cookiecutter/cookiecutter-django/blob/beba4c177d6c0c40e51ec71fa4447966c10a4297/%7B%7Bcookiecutter.project_slug%7D%7D/%7B%7Bcookiecutter.project_slug%7D%7D/users/tasks.py#L1-L9

Before

myapp/tasks.py


from celery import shared_task
from config import celery_app

@shared_task() def task_1(): ...

@celery_app.task() def task_2_that_called_for_cronjob(): ...


### After

```python
from config import celery_app

@celery_app.task()
def task_1():
   ...

@celery_app.task()
def task_2_that_called_for_cronjob():
   ...

agusmakmun commented 3 months ago

The issue still appears again.

foarsitter commented 3 months ago

shared_task is just another way of registering tasks for situations when there is no app available yet. So that can't be the solution.

agusmakmun commented 3 months ago

I found some related issue are because of impact of upgrade Django 4 + using sync_to_async, as I'm right now using it to handle the async process for the websocket.

This may be due to Django 4.0 having per-request contexts for the thread sensitivity of sync_to_async() — See #32889. If so, that's kind-of a good thing, in that too many open resources is what you'd expect in async code, and up to now, we've not been hitting that, as we've essentially been running serially. - https://code.djangoproject.com/ticket/33497#comment:4

foarsitter commented 3 months ago

So you put CONN_MAX_AGE to 0 and solved the issue?

agusmakmun commented 3 months ago

Nope, I had changed it to 0. And the error still occurs.

agusmakmun commented 3 months ago

The issue finally resolved after implementing the django-db-connection-pool + set the CONN_MAX_AGE=0

settings.py

DATABASES = {"default": env.db("DATABASE_URL")}
DATABASES["default"]["ATOMIC_REQUESTS"] = True
# https://github.com/altairbow/django-db-connection-pool?tab=readme-ov-file#postgresql
DATABASES["default"]["ENGINE"] = "dj_db_conn_pool.backends.postgresql"
DATABASES["default"]["POOL_OPTIONS"] = {
    "POOL_SIZE": 10,
    "MAX_OVERFLOW": 10,
    "RECYCLE": 24 * 60 * 60,
}

# Set to 0 to disable long(er) living connections
# https://docs.djangoproject.com/en/dev/ref/settings/#std-setting-CONN_MAX_AGE
# https://github.com/cookiecutter/cookiecutter-django/issues/5071
# "connection failed: FATAL: remaining connection slots are reserved for non-replication superuser connections"  # noqa: ERA001
DATABASES["default"]["CONN_MAX_AGE"] = env.int("CONN_MAX_AGE", default=0)

requirements.txt

psycopg[pool]==3.1.18  # https://github.com/psycopg/psycopg
django-db-connection-pool[psycopg3]==1.2.5  # https://github.com/altairbow/django-db-connection-pool

Screenshot 2024-05-17 at 20 05 22

agusmakmun commented 3 months ago

The upcoming errors that occurs is ProgrammingError the last operation didn't produce a result. \ Even after changing the ORM of .exists() to .count() > 0.

foarsitter commented 3 months ago

@agusmakmun is CONN_MAX_AGE required when a pool is involved? That shouldn't have to be the case I suppose.

agusmakmun commented 3 months ago

For this case, was resolved after downgrade the python fromdocker.io/python:3.12.3-slim-bookworm topython:3.10.10-slim-bullseye.

I also consulate with ChatGPT with this query:

For a database server with 2 cores and 2GB of memory, how many POOL_SIZE, MAX_OVERFLOW and RECYCLE. I need numbers only.

DATABASES["default"]["ENGINE"] = "dj_db_conn_pool.backends.postgresql"
DATABASES["default"]["POOL_OPTIONS"] = {
    "POOL_SIZE": 3,
    "MAX_OVERFLOW": 3,
    "RECYCLE": 1800,  # 30 minutes
}

DATABASES["default"]["CONN_MAX_AGE"] = env.int("CONN_MAX_AGE", default=0)

Until deployed to production, no issues so far. Not sure which one is correct, either decrease the pool_size with propper size, or downgrade the python.

foarsitter commented 3 months ago

You can certain remove CONN_MAX_AGE since it defaults to 0.

Saurav-Paul commented 3 months ago

The issue finally resolved after implementing the django-db-connection-pool + set the CONN_MAX_AGE=0

settings.py

DATABASES = {"default": env.db("DATABASE_URL")}
DATABASES["default"]["ATOMIC_REQUESTS"] = True
# https://github.com/altairbow/django-db-connection-pool?tab=readme-ov-file#postgresql
DATABASES["default"]["ENGINE"] = "dj_db_conn_pool.backends.postgresql"
DATABASES["default"]["POOL_OPTIONS"] = {
    "POOL_SIZE": 10,
    "MAX_OVERFLOW": 10,
    "RECYCLE": 24 * 60 * 60,
}

# Set to 0 to disable long(er) living connections
# https://docs.djangoproject.com/en/dev/ref/settings/#std-setting-CONN_MAX_AGE
# https://github.com/cookiecutter/cookiecutter-django/issues/5071
# "connection failed: FATAL: remaining connection slots are reserved for non-replication superuser connections"  # noqa: ERA001
DATABASES["default"]["CONN_MAX_AGE"] = env.int("CONN_MAX_AGE", default=0)

requirements.txt

psycopg[pool]==3.1.18  # https://github.com/psycopg/psycopg
django-db-connection-pool[psycopg3]==1.2.5  # https://github.com/altairbow/django-db-connection-pool

Screenshot 2024-05-17 at 20 05 22

Thank you so much.. I was suffering from this issue.

agusmakmun commented 3 months ago

Guys, FYI, until now I still facing this issue some how.

foarsitter commented 3 months ago

And you did increase max_connections to 1000?

agusmakmun commented 2 months ago

Guys, FYI, our issue finally resolved after increasing the PostgreSQL database from default 100 to max_connections=300. and change the DATABASES["default"]["CONN_MAX_AGE"] = env.int("CONN_MAX_AGE", default=0) also remove the additional database pool setup to use default one.

My last issue is because of max_connections not set properly.

cookiecutter / cookiecutter-django