brianc / node-postgres

PostgreSQL client for node.js.
https://node-postgres.com
MIT License
12.19k stars 1.22k forks source link

pg-pool with pgBouncer #3018

Open SamGoody opened 1 year ago

SamGoody commented 1 year ago

I need to handle many simultaneous connections.

Ubuntu Jammy Avg. ~400/connections per second, for several hour at a time. Not-distributed: both Node and the one instance of Postgres are on the same server. Using PG with TimeScaleDB for the actual data storage.

Researching the internets sends me to PgBouncer. Can someone please clarify the relationship of pg-pool to PgBouncer. Which, if any, are correct:

  1. Use both pg-pool and pgBouncer, they are solving different needs. [If so, please explain a bit]
  2. Use both even though they are solving the same issue, because then we get double efficiency.
  3. Use only pg-pool, because it is more efficient, or because it has some other important benefit.
  4. Use only pgBouncer, because it is more efficient, or because it has some other important benefit.

Thank you very much!

pdeaudney commented 1 year ago

Historically PostgreSQL didn't handle large numbers of connections concurrently well (especially with the connection setup & tear down back in the cgi script days). And pre-PostgreSQL 14.

When you say ~400/connections per second? What connections are you talking about here (its too vague.. are you saying 400 new connections per sec to PostgreSQL is made/opened? Are you saying ~400 concurrent connections to you application that may need about the same number of connections per sec to the DB etc?).

I'm running many node processes in docker/k8s that use pg-pool with a pool size of around ~20 I think (app config defaults has it as 10 per process). We have 85 pods/processes running each with their own pg-pool.

This is operating at around 850 requests a sec when I look at the monitoring systems right now. The db monitoring shows around 220-300 connections active (not all allocated to just this service i'm using as an example, but it would be ~80-90% of them). We're only using pg-pool for this app.

Historically pre-PostgreSQL 14, large numbers of connections were not handled great (large is relative to your infrastructure/hardware etc). Details on PG14 handling more connections better

So what version of PostgreSQL are you using? This may matter a bit under load if you're on a virtual machine/smaller sized hardware allocation. Use 15 if you can.

pg-pool and pgBouncer are solving slightly different problems. My advice would be to start with pg-pool and validate your setup. pgBouncer is another thing to run & operate. Avoid extra work if you can, until you know you need it.

pgBouncer is good for brokering connections to the database when you're in a resource limited environment. Each active connection to PostgreSQL has resulting overhead for tracking what row data is visible across transaction boundaries. Another benefit is it can let you smooth out operations like PostgreSQL patching in a HA environment (pause traffic to the DB server behind pgBouncer and show that as query latency to your applications as they connect to the bouncer, then resume traffic after a HA switch over to a patched DB server for example).

pg-pool is great for reducing or eliminating the connection setup overhead/latency in your request cycle. As the pool will keep a number of connections open ready to be handed off for use and then returned to the pool.

My advice, just use pg-pool for now. Test your setup with some load tests before you launch to production though. You may need to tune PostgreSQL a little.

In terms of pg-pool settings. Set things like idleTimeoutMillis to not keep connections in transactions if they're idle (bad for performance). 5000 is what we set. maxLifetimeSeconds to cycle the connections in the pool over time. 60000 is what we set. statement_timeout to ensure you don't have really long running queries that could be running poorly performing SQL queries. 60000 (value is ms in PgSQL), so 60s. We could set this lower as our reverse proxy time out is 15s by default. connectionTimeoutMillis to not attempt connecting to a down server for a very long period of time. 60000.

abenhamdine commented 1 year ago

Another important thing to know is that currently you cannot use prepared statements with pgBouncer

bdbrownell commented 11 months ago

If you have a single instance of node, with a single instance of postgres - pg-pool is perfect for your use case. Thats the only place that pg-pool works without issue.

PG bouncer is for many node instances that are independent from each other - they have no way to know what each process is doing with its own pg-pool. PG Bouncer acts as the connection pool, it receives the queries from each process and handles the pooling logic to communicate to the database.

They solve the same problem, for different architectures and there's no real reason to use them together.

Many node processes / Distributed systems / Microservices => PG Bouncer

Monolithic, Single instance, Single Process => pg-pool