gajus / slonik

A Node.js PostgreSQL client with runtime and build time type safety, and composable SQL.
Other
4.61k stars 140 forks source link

Pool is missing error handling #471

Closed valeneiko closed 1 year ago

valeneiko commented 1 year ago

If existing pool connection is terminated by the server, PgPool will emit error event. Slonik does not register a listener for that event, so it results in a crash instead. It can happen in case of DB failover, or if someone runs pg_terminate_backend(...) query.

PgPool is EventEmitter which can emit error events, which by default crash node process: docs.

Expected Behavior

Error is logged and ignored.

Current Behavior

If PgPool emits error event, node crashes : docs.

Possible Solution

Steps to Reproduce

  1. Create slonik pool, establish a few connections and wait
    const pool = await createPool(...);
    await Promise.all(new Array(10).fill(null).map(() => pool.connect(() => Promise.resolve())));
    await setTimeout(10000);
  2. Run the SQL query to terminate all connections (or crash the DB)
    SELECT *, pg_terminate_backend(pid)
    FROM pg_stat_activity
    WHERE pid != pg_backend_pid()
  3. Observe process started in step1 crashed
gajus commented 1 year ago

I am not able to replicate this issue @valeneiko

https://github.com/gajus/slonik/pull/507

Based on the description, I would expect this test to fail somehow, but it is passing as expected.

https://github.com/gajus/slonik/pull/507/files#diff-87a6820c951329f6ddf0055e0afd4f289f3be2a0bd147d7dccab62657e0d699bR175

I don't want to make changes without showing failing tests for it.

valeneiko commented 1 year ago

I have just tried this script (a copy of your test) and got the exception + crash:

import { setTimeout as delay } from 'timers/promises';
import { createPool, sql } from 'slonik';

async function main() {
  process.on('uncaughtExceptionMonitor', (err) => {
    console.log(
      '\n\n==================================== UNCAUGHT EXCEPTION ====================================\n\n',
      err,
      '\n\n============================================================================================\n\n',
    );
  });

  const pool = await createPool(
    'postgres://postgres:postgres@postgres:5432/postgres',
  );

  let targetPid;

  setTimeout(() => {
    if (!targetPid) {
      throw new Error('Expected target PID to be set');
    }

    console.log('terminatinating active backend %s', targetPid);

    void pool.query(sql.unsafe`
      SELECT pg_terminate_backend(${targetPid})
    `);
  }, 200);

  let targetQuery;

  await pool.connect(async (connection) => {
    targetPid = await connection.oneFirst(sql.unsafe`
      SELECT pg_backend_pid()
    `);

    await delay(1_000);

    targetQuery = connection.query(sql.unsafe`
      SELECT 1
    `);
  });
}

void main();

Here is the output:

terminatinating active backend 84

==================================== UNCAUGHT EXCEPTION ====================================

 Error: Client has encountered a connection error and is not queryable
    at /workspace/node_modules/.pnpm/pg@8.11.3/node_modules/pg/lib/client.js:526:17
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async /workspace/node_modules/.pnpm/slonik@35.0.1_zod@3.22.2/node_modules/slonik/dist/connectionMethods/query.js:8:24
    at async executeQuery (/workspace/node_modules/.pnpm/slonik@35.0.1_zod@3.22.2/node_modules/slonik/dist/routines/executeQuery.js:126:26)
    at async query (/workspace/node_modules/.pnpm/slonik@35.0.1_zod@3.22.2/node_modules/slonik/dist/connectionMethods/query.js:6:12) {
  notices: []
} 

============================================================================================

Maybe the test framework you are using silences uncaught exceptions?

gajus commented 1 year ago

Maybe the test framework you are using silences uncaught exceptions?

In your example, you are not handling targetQuery promise rejection.

    targetQuery = connection.query(sql.unsafe`
      SELECT 1
-    `);
+.   `).catch((error) => { console.log('error', error) });

That will prevent the undesired behavior.

valeneiko commented 1 year ago

Yup, my bad. Missed that.

Here is the script that actually crashes:

import { createPool, sql } from 'slonik';

async function main() {
  process.on('uncaughtExceptionMonitor', (err) => {
    console.log(
      '\n\n==================================== UNCAUGHT EXCEPTION ====================================\n\n',
      err,
      '\n\n============================================================================================\n\n',
    );
  });

  const pool = await createPool(
    'postgres://postgres:postgres@postgres:5432/postgres',
  );
  await Promise.all(
    new Array(2).fill(null).map(() => pool.connect(() => Promise.resolve())),
  );

  await pool
    .query(
      sql.unsafe`
    SELECT *, pg_terminate_backend(pid)
    FROM pg_stat_activity
    WHERE pid != pg_backend_pid()`,
    )
    .catch(() => {});
}

void main();

Output:

==================================== UNCAUGHT EXCEPTION ====================================

 error: terminating connection due to administrator command
    at Parser.parseErrorMessage (/workspace/node_modules/.pnpm/pg-protocol@1.6.0/node_modules/pg-protocol/dist/parser.js:287:98)
    at Parser.handlePacket (/workspace/node_modules/.pnpm/pg-protocol@1.6.0/node_modules/pg-protocol/dist/parser.js:126:29)
    at Parser.parse (/workspace/node_modules/.pnpm/pg-protocol@1.6.0/node_modules/pg-protocol/dist/parser.js:39:38)
    at Socket.<anonymous> (/workspace/node_modules/.pnpm/pg-protocol@1.6.0/node_modules/pg-protocol/dist/index.js:11:42)
    at Socket.emit (node:events:514:28)
    at addChunk (node:internal/streams/readable:324:12)
    at readableAddChunk (node:internal/streams/readable:297:9)
    at Readable.push (node:internal/streams/readable:234:10)
    at TCP.onStreamRead (node:internal/stream_base_commons:190:23)
...

Looks like it is idle connections that cause it.

gajus commented 1 year ago

Thank you!

gajus commented 1 year ago

Note that this didn't address setupClient. I will try to add a test for it too.

github-actions[bot] commented 1 year ago

:tada: This issue has been resolved in version 35.0.3 :tada:

The release is available on:

Your semantic-release bot :package::rocket: