brianc / node-postgres

PostgreSQL client for node.js.
https://node-postgres.com
MIT License
12.13k stars 1.21k forks source link

"Client has encountered a connection error and is not queryable" without further info #1790

Open emlai opened 5 years ago

emlai commented 5 years ago

Since upgrading from pg 7.4.1 to pg 7.7.1, we are getting "Client has encountered a connection error and is not queryable" error in production after a week of working just fine, but no additional information is being logged that could help us determine what the underlying issue is.

I believe this may have been caused by the changes in https://github.com/brianc/node-postgres/commit/3828aa86081491282c118884c245b4a19619f02e.

Expected behavior: also log the error reason.

charmander commented 5 years ago

The reason will have been emitted as an 'error' event on the client (and on the pool as well, if you’re using a pool).

erwanjb commented 5 years ago

hello I have a similar issue: I got a Pool that is not queryable and any error logged. I cant catch this error regardless of what I do. got pg 7.7.1 and postgresql 9.6 I think these versions are incompatible. I will try to install postgresql 11

jvbianchi commented 5 years ago

I'm having this problem too. In my case I am using version 7.7.1 with typeorm

emlai commented 5 years ago

@charmander Ok thanks! Seems I just missed it in our application logs since it was logged several hours before the "Client has encountered a connection error and is not queryable" message in each case.

Maybe it would make sense to include a copy of the reason along with the "Client has encountered a connection error and is not queryable" message, just for easier debugging?

For the record, we're using TypeORM and a pool.

Anyway, from our side, this issue can be closed. Thanks for the quick help!

rvion commented 5 years ago

I also see this error with plain pg lib (without using TypeORM or any other orm). as @charmander said, an 'error' event is emmited.

pool.on('error',...)

but it is emmited only once, while the Client has encountered a connection error and is not queryable is printed everytime I use the defectuous pool

in my case, the emmited error was

Connection terminated unexpectedly Error: Connection terminated unexpectedly

in my case, this happened because of a network reset. In some other situations, I had the impression that pool was resilient to connection loss, but I guess it's not always the case.

seriousManual commented 4 years ago

@rvion I'm currently seeing a similar issue, the error "Connection terminated unexpectedly" has come up, afterwards the connection was not queryable anymore ("Client has encountered a connection error and is not queryable".

May I ask how you approached this problem?

Aditya94A commented 4 years ago

How about adding a helper method for auto-reconnecting whenever this happens?

gt-c commented 2 years ago

How about adding a helper method for auto-reconnecting whenever this happens?

Any update on this?

anonrig commented 2 years ago

How about adding a helper method for auto-reconnecting whenever this happens?

Any update on this?

Any update on this?

elimisteve commented 1 year ago

I have this same issue. Should we create a new Client (or Pool) object on error or what?

elimisteve commented 1 year ago

Apparently that's exactly what we should do. From https://github.com/brianc/node-postgres/issues/1611#issuecomment-558566873 (see post for more info) --

db = new pg.Client();
db.on('error', e => {
  console.error('Database error', e);
  db = null;
});
elimisteve commented 1 year ago

Starting with the above code snippet, here's what I've come up with:

// db.js
let dbClient;

export const getDbClient = () => {
  // TODO: How do I know dbClient is any good?  <---
  return dbClient;
}

const reconnectToDb = async () => {
  dbClient = null;
  dbClient = await createDbClient();
}

const createDbClient = async () => {
  const client = new Client({
    connectionString: "..."
  });

  client.on('error', err => {
    console.error('Postgres error:', err.stack);
    reconnectToDb();
  });

  await client.connect();
  console.log('Successfully connected to Postgres');

  return client;
};

(async () => {
  await reconnectToDb();
})()
// Doing Postgres queries
const result = await getDbClient().query("...")

It works great in most ways -- my server doesn't crash, and after there's an error querying Postgres the client automatically reconnects -- but ideally I would know that my next query isn't going to work and create a new DB connection (see the TODO toward the top of this post).

EDIT: Ideally I could do await dbClient.ping() or something and then create a new DB connection if the ping fails.

chevdor commented 1 year ago

It would be convenient to know what note is causing troubles. I think I am running into the issue due to 2 "big" attachments of 108MB and 130MB.

I see the failures in the logs of the DB:

2023-01-08 22:36:37.311 UTC [10223] LOG:  checkpoint complete: wrote 31 buffers (0.2%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.012 s, sync=0.006 s, total=0.028 s; sync files=30, longest=0.004 s, average=0.001 s; distance=130 kB, estimate=130 kB
2023-01-08 22:36:37.321 UTC [1] LOG:  database system is ready to accept connections

at that point we are ok then the sync starts.

2023-01-08 22:36:54.482 UTC [1] LOG:  server process (PID 10228) was terminated by signal 9: Killed
2023-01-08 22:36:54.482 UTC [1] DETAIL:  Failed process was running: update "items" set "content" = $1 where "id" = $2 returning "id"
2023-01-08 22:36:54.482 UTC [1] LOG:  terminating any other active server processes
2023-01-08 22:36:54.483 UTC [1] LOG:  all server processes terminated; reinitializing
2023-01-08 22:36:54.561 UTC [10229] LOG:  database system was interrupted; last known up at 2023-01-08 22:36:37 UTC
2023-01-08 22:36:54.561 UTC [10232] FATAL:  the database system is in recovery mode
2023-01-08 22:36:56.421 UTC [10229] LOG:  database system was not properly shut down; automatic recovery in progress
2023-01-08 22:36:56.431 UTC [10229] LOG:  redo starts at 1/18B3EBF8
2023-01-08 22:36:56.450 UTC [10229] LOG:  invalid record length at 1/18B5F940: wanted 24, got 0
2023-01-08 22:36:56.450 UTC [10229] LOG:  redo done at 1/18B5F8F8 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.01 s
2023-01-08 22:36:56.455 UTC [10230] LOG:  checkpoint starting: end-of-recovery immediate wait
2023-01-08 22:36:56.482 UTC [10230] LOG:  checkpoint complete: wrote 31 buffers (0.2%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.014 s, sync=0.006 s, total=0.028 s; sync files=30, longest=0.004 s, average=0.001 s; distance=131 kB, estimate=131 kB
2023-01-08 22:36:56.491 UTC [1] LOG:  database system is ready to accept connections

I see no error in the server logs but the resource ID can be seen there as the PUT fails:

10.244.0.1 - - [08/Jan/2023:22:40:51 +0000] "PUT /api/items/root:/.resource/06da27add9d647d2000b2dc6b295cf1a:/content HTTP/1.1" 500 124 "-" "node-fetch/1.0 (+https://github.com/bitinn/node-fetch)"
10.244.0.1 - - [08/Jan/2023:22:41:03 +0000] "PUT /api/items/root:/.resource/06da27add9d647d2000b2dc6b295cf1a:/content HTTP/1.1" 500 124 "-" "node-fetch/1.0 (+https://github.com/bitinn/node-fetch)"
10.244.0.1 - - [08/Jan/2023:22:41:19 +0000] "PUT /api/items/root:/.resource/06da27add9d647d2000b2dc6b295cf1a:/content HTTP/1.1" 500 124 "-" "node-fetch/1.0 (+https://github.com/bitinn/node-fetch)"
10.244.0.1 - - [08/Jan/2023:22:41:36 +0000] "PUT /api/items/root:/.resource/06da27add9d647d2000b2dc6b295cf1a:/content HTTP/1.1" 500 124 "-" "node-fetch/1.0 (+https://github.com/bitinn/node-fetch)"
chevdor commented 1 year ago

I forgot to mention that I ran into this issue with attachments over 100MB whereas my initial limit was set to 100MB (so totally expected issues). After seeing this, I bumped the limit to 200MB. My 108MB file went through but it chokes on the 130MB.

chevdor commented 1 year ago

Spotting the resource ID in the logs, I could delete the faulty attachement using the "tools/note attachments" and the issue is solved. I could reproduce the issue by adding the attachment again to the note: a 130MB audio recording in MP3 format. That seems to fail 100% of the time.

tswaters commented 1 month ago

This happens if there's a background error with the connection... i.e., one that isn't tied to a client.query call.

If something like idle in transaction timeout happens, there might be an error raised outside of a client.query.... this will get emitted as an error event on client, and pool. (unhandled error events will crash the process)

Refer to docs: https://node-postgres.com/apis/client#error -- specifically, " due to network partitions, back-end crashes, fail-overs, etc the client can (and over a long enough time period will) eventually be disconnected while it is idle"

If you try to use client.query('...') while it's present, idle - but the underlying connection died, you'll get this error.

--

Also, felt I shouuld comment on @chevdor 's message, specifically highlighting this:

2023-01-08 22:36:54.482 UTC [1] LOG:  server process (PID 10228) was terminated by signal 9: Killed
2023-01-08 22:36:54.482 UTC [1] DETAIL:  Failed process was running: update "items" set "content" = $1 where "id" = $2 returning "id"
2023-01-08 22:36:54.482 UTC [1] LOG:  terminating any other active server processes
2023-01-08 22:36:54.483 UTC [1] LOG:  all server processes terminated; reinitializing
2023-01-08 22:36:54.561 UTC [10229] LOG:  database system was interrupted; last known up at 2023-01-08 22:36:37 UTC
2023-01-08 22:36:54.561 UTC [10232] FATAL:  the database system is in recovery mode

Someone kill -9'd your database ... postgres really doesn't like when you do that which is why it said it was in recovery mode. If you have a Client and you're not using a pool, that client would've had an abrupt socket disconnect after the kill -9, you only see the error after trying to call client.query afterwards.

Best way to handle that is use Pool ... this keeps a series of clients, most of them will be in a diconnected state until they are needed, then they'll be reused. If a timeout is reached (default 10 seconds) it'll disconnect and go away. Next pool.connect will return the first connected non-idle client, or create a new connection if none exist.

Some orms might add error handlers for pool... but if not, refer again to the docs, https://node-postgres.com/apis/pool#error :

You probably want to add an event listener to the pool to catch background errors! Just like other event emitters, if a pool emits an error event and no listeners are added node will emit an uncaught error and potentially crash your node process.

And the nodejs process will absolutely crash if an unhandled error event bubbles up without getting caught.