Open mnalsup opened 7 months ago
Hey @mnalsup did you ever have any progress on this? I am having the same issue, I've tried so many different setups...
The process just immediately exits when the connection is terminated, the error handler is not even fired.
Yeah it should - do you have a way to reproduce this in an isolated way (e.g. not relying on AWS failover exactly?) Happy to get this fixed if I can get a reliable reproduction of it locally.
@brianc Sorry for the delay on responding. I can also reproduce by just stopping a Postgres container locally. I think you could do any locally hosted PG instance and just terminate the DB service while the script runs. Here is my setup:
docker-compose.yaml
version: '3.8'
services:
postgres:
container_name: pg_test_postgres
image: postgres:12-alpine
ports:
- 5432:5432
environment:
POSTGRES_USER: test
POSTGRES_PASSWORD: test
POSTGRES_DB: test
error_pool.js
const { Pool } = require('pg')
const start = async () => {
const pool = new Pool({
user: 'test',
password: 'test',
host: 'localhost',
port: 5432,
})
// the pool will emit an error on behalf of any idle clients
// it contains if a backend error or network partition happens
pool.on('error', (err, client) => {
console.error('POOL: Unexpected error on idle client', err)
process.exit(-1)
})
const client = await pool.connect()
// Added snippet for client errors
// THIS WAS REMOVED TO REPRODUCE ERROR
// client.on('error', (err, client) => {
// console.error('Unexpected error on idle client', err)
// process.exit(-1)
// })
for (let i = 0; i < 500; i++) {
const res = await client.query('SELECT 1')
console.log(res.rows)
await new Promise((resolve) => setTimeout(resolve, 500))
}
client.release()
}
start()
run:
docker-compose up -d
node ./error_pool.js
Then, stop the container via docker while the script is running. You should get throw er; // Unhandled 'error' event
. I assume you could do something similar with any local PG install.
LMK if you need anything else to reproduce
Okay great I see what's going on now - when you comment on the error listener on the client itself that's checked out from the pool the client emits an idle error. Right...so....that's actually intended (though admittedly not wonderful) behavior. When you check a client out from the pool the client is yours for all purposes and your responsibility until you return it to the pool. So yeah ideally you'd add an error listener to a client if you were checking it out and it was going to be idle for long periods of time while you "owned" it, and then remove the error listener when you return it to the pool (otherwise you'd just keep adding more and more error listeners whenever you ran the same code again, right?)
The reason this is happening in your code is because though you are doing a loop of 500 queries, you're waiting 1/2 a second in between each query. Since a localhost select 1
query to postgres only takes 1-2 milliseconds to execute, you're actually causing your checked out client to experience a network partition likely in the 500 milliseconds its not waiting for the query response, so it emits the error.
Is this a documentation issue or should the pool be "forwarding" this error event up from the client?
I guess a documentation clarity issue? Admittedly error handling is kinda gross because of the async nature of everything and node's very unforgiving nature of crashing your entire process if anything emits an error
event anywhere and it isn't listend for.
Later this week I was thinking of working on an opt-in new error mode in which all errors are handled by clients and they will "hold on to" any error they receive and then immediately reject the next query w/ any error that they received while they were idle. That would make adding error listeners entirely un-needed. The downside is you could have disconnected due to a network partition from postgres 30 minutes ago and you wont know if/until you attempt to query it again...but that's actually probably fine. The only thing I think I might need to do is add some kinda flag to the error indicating it wasn't an error from the query you just attempted because in 99% of the cases if you try to query a database and get an error that happened "at some point in the past while you were idly connected" you really really should terminate your existing client & try to open a new one because your connection is in a bad state. Anyways...I'll document both the current shortcomings and new mode up w/ some examples on the documentation website. Probably work on this tomorrow or Wednesday, depending on how my week shakes out.
Hope that ramdling makes sense?
It does! Thanks for the clarification. That error mode definitely makes sense to me and would alleviates some of the lifecycle challenges. Thanks for maintaining this much depended-on library!
In the docs here: https://node-postgres.com/features/pooling#checkout-use-and-return
The above sample code snippet did not seem to catch an error from an AWS RDS failover. I received this error when using sample code:
However when I added an error event handler to the client in the response from the connect() call as below I was able to catch the error.
Is this a documentation issue or should the pool be "forwarding" this error event up from the client?