PostgREST / postgrest

REST API for any Postgres database
https://postgrest.org
MIT License
22.6k stars 999 forks source link

Empty error message, `db-pool` size surpassed #3214

Open steve-chavez opened 4 months ago

steve-chavez commented 4 months ago

Environment

Problem

A running instance with a db-pool = 60 reports the following problems.

1. The authenticator role surpases db-pool connections.

Here it can be seen it reaches 71 connections.

select usename as rolname, count(*) as total_connections, sum(case when state = 'active' then 1 else 0 end) as active_connections, sum(case when state = 'idle' then 1 else 0 end) as idle_connections from pg_stat_activity group by usename;

     rolname       | total_connections | active_connections | idle_connections
---------------------+-------------------+--------------------+------------------
authenticator       |                71 |                  0 |               71

At some other time it reports 130 connections and surpasses pg max_connections:

01/Feb/2024:17:43:04 +0000: {"code":"PGRST000","details":"FATAL:  remaining connection slots are reserved for non-replication superuser connections\n","hint":null,"message":"Database connection error. Retrying the connection."}

2. PostgREST replies with an empty error message.

$ curl localhost:3000/x  -i

400 Bad Request
{ code: '', details: null, hint: null, message: '' }

A subsequent request succeeds since PostgREST recovers though.

I wasn't able to find the root cause of this. This instance in particular just went through a migration from ipv4 to ipv6 only.

However it's clear that the hasql pool is not recycling the pool connections. An empty error message shouldn't happen too.

Proposal

We need to increase the pool log traces. Log whenever the hasql pool reaper works (ref). This way at least we can find out if the pool is misbehaving. Currently it's not possible to know if the reaper fails.

Workarounds

A connection limit can be set on the authenticator role so the db-pool max is guaranteed.

alter role authenticator connection limit <db-pool + 1 for listen channel>

An idle timeout can be set in case the pool doesn't recycle idle connections.

alter role authenticator set idle_session_timeout to 1800000;

Related

steve-chavez commented 1 month ago

Update on https://github.com/nikita-volkov/hasql-pool/issues/43, we now have https://github.com/nikita-volkov/hasql-pool/pull/45.

So with those new observations, we should be able to add two more metrics:

And then we can close this issue.