brianc / node-postgres

PostgreSQL client for node.js.
https://node-postgres.com
MIT License
12.17k stars 1.21k forks source link

Errored QueryStream causes Client to throw on message receive #1674

Closed rkaw92 closed 5 months ago

rkaw92 commented 6 years ago

Hi, I have encountered a really bizarre issue when using knex with pg and query streams. A parameterized QueryStream which fails in the parsing stage (e.g. because of a column type / value type mismatch) breaks subsequent queries on the same connection - either by:

Below are two ways to reproduce this issue, one using knex and another simplified to bare pg (which simulates how knex interacts with pg).

Test conditions:

'use strict';

function runKnex() {
    const knex = require('knex');
    const db = knex({
        client: 'pg',
        connection: {
            host: process.env.PGHOST,
            user: process.env.PGUSER,
            password: process.env.PGPASSWORD,
            database: process.env.PGDATABASE
        },
        pool: {
            min: 1,
            max: 1
        }
    });
    const resultStream = db('events').select('*').where({ id: 'not a UUID' }).stream();
    resultStream.on('data', function() {});

    return db.raw('SELECT NOW() AS now').then(function(rows) {
        console.log('now: %s', rows[0].now);
    });
    resultStream.on('error', function(error) {
        console.error('stream error:', error);
    });
}

function runPg() {
    const Client = require('pg').Client;
    const QueryStream = require('pg-query-stream');
    const client = new Client();
    return client.connect().then(function() {
        const querySpec = new QueryStream('SELECT * FROM events WHERE id = $1', [ 'not a UUID' ]);
        const rawStream = client.query(querySpec);
        rawStream.on('error', function(error) {
            console.log('stream error:', error);
            rawStream.close();
        });
        rawStream.on('data', function() {});

        client.query('SELECT NOW() AS now').then(function(result) {
            console.log('now: %s', result.rows[0].now);
        });
    }).catch(function(error) {
        console.error(error);
        process.exit(5);
    });
}

runPg();
// or alternatively:
// runKnex();

What should happen:

What actually happens

It does not crash if:

Overall, I think some state machine inside pg.Client breaks, and activeQuery is set to null because of it, but I have not been able to pin-point exactly where or why it happens. It seems to be connected to closing the ReadableStream returned from client.query().

This is especially hard to debug when using pg with a pool (such as knex.js's), because the error may go unnoticed until the same client object is re-acquired from the pool and used; this is why I opted for single-connection knex / pg examples above.

sehrope commented 6 years ago

I'm able to reproduce this and I think it's related to the internal client query queue. I haven't totally figured out the cause but I suspect it's from the FEBE messages for the stream.close() being sent after the messages for the next query. If you change the repro code to wait for the completion of the stream.close(...) by giving it a callback and waiting for it's execution then the error does not occur:

// npm install pg pg-query-stream async-main
const pg = require('pg');
const { Client } = pg;
const QueryStream = require('pg-query-stream');
const main = require('async-main').default;

function log(message, ...args) {
    console.log(`%s ${message}`, new Date().toISOString(), ...args);
}

function streamToPromise(stream) {
    return new Promise((resolve, reject) => {
        stream.on('end', () => resolve());
        stream.on('error', reject);
    });
}

function streamClose(stream) {
    return new Promise((resolve) => {
        stream.close(resolve);
    });
}

main(async () => {
    log('Started');
    const client = new Client(process.env.DATABASE_URL);
    try {
        log('Connecting');
        await client.connect();
        log('Connected');

        let stream;
        try {
            const query = new QueryStream('SELECT $1::uuid AS x', ['bad data']);
            stream = client.query(query);
            stream.on('data', (data) => {
                log('Received data: %j', data);
            });
            await streamToPromise(stream);
        } catch (e) {
            log('Error streaming query: %s', e);
        } finally {
            if (stream) {
                log('Closing stream');
                // Replace with `stream.close()` to get the error
                await streamClose(stream);
                log('After close stream');
            }         
        }
        log('Before execute new query');
        const result = await client.query('SELECT 1 AS x');
        log('Result: %j', result.rows);
    } finally {
        await client.end();
    }
});

Output when waiting for stream.close():

2018-06-13T11:48:29.269Z Started
2018-06-13T11:48:29.272Z Connecting
2018-06-13T11:48:29.279Z Connected
2018-06-13T11:48:29.281Z Error streaming query: error: invalid input syntax for uuid: "bad data"
2018-06-13T11:48:29.281Z Closing stream
2018-06-13T11:48:29.324Z After close stream
2018-06-13T11:48:29.324Z Before execute new query
2018-06-13T11:48:29.326Z Result: [{"x":1}]

Replacing the await streamClose(...) with a regular stream.close() gets the same error as you.

There's probably a fix that could be added to the driver to better handle this situation but knex should also be waiting for the close() event prior to continuing.

BasixKOR commented 3 years ago

Do we have a progress on this issue? I think I just hit this error and it seems like this issue is abandoned.

alxndrsn commented 1 year ago

Here's a repro for pg v8.8.0 & node v16.13.2 & v18.6.0, postgres 12.12:

const pg = require('pg');
const QueryStream = require('pg-query-stream');

const pool = new pg.Pool({
  poolSize: 1,
  connectionTimeoutMillis: 400,
  statement_timeout: 400,
});

(async () => {
  const query = new QueryStream('SELECT TRUE');
  const client = await pool.connect();
  const stream = await client.query(query);

  try {
    const res = await pool.query('SELECT TRUE');
    throw new Error(`Unexpected res (should have failed): ${JSON.stringify(res, null, 2)}`);
  } catch(err) {
    if(err.message !== 'timeout exceeded when trying to connect') throw new Error(`Unexpected failure: ${err}`);
  }

  await stream.destroy(); // without this, next attempt at pool.query() will hang
  await client.release();

  const res = await pool.query('SELECT TRUE AS val');
  if(res.rows.length !== 1 || res.rows[0].val !== true) {
    throw new Error(`Unexpected res: ${JSON.stringify(res.rows)}`);
  }
})();

Output (node 16.13.2)

~/node_modules/pg/lib/client.js:340
    this.activeQuery.handleRowDescription(msg)
                     ^

TypeError: Cannot read properties of null (reading 'handleRowDescription')
    at Client._handleRowDescription (~/node_modules/pg/lib/client.js:340:22)
    at Connection.emit (node:events:390:28)
    at ~/node_modules/pg/lib/connection.js:114:12
    at Parser.parse (~/node_modules/pg-protocol/dist/parser.js:40:17)
    at Socket.<anonymous> (~/node_modules/pg-protocol/dist/index.js:11:42)
    at Socket.emit (node:events:390:28)
    at addChunk (node:internal/streams/readable:315:12)
    at readableAddChunk (node:internal/streams/readable:289:9)
    at Socket.Readable.push (node:internal/streams/readable:228:10)
    at TCP.onStreamRead (node:internal/stream_base_commons:199:23)

postgres version

# SELECT VERSION();
                                                                version                                                                 
----------------------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 12.12 (Ubuntu 12.12-0ubuntu0.20.04.1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0, 64-bit
(1 row)

Output (node 18.6.0)

~/node_modules/pg/lib/client.js:340
    this.activeQuery.handleRowDescription(msg)
                     ^

TypeError: Cannot read properties of null (reading 'handleRowDescription')
    at Client._handleRowDescription (~/node_modules/pg/lib/client.js:340:22)
    at Connection.emit (node:events:513:28)
    at ~/node_modules/pg/lib/connection.js:114:12
    at Parser.parse (~/node_modules/pg-protocol/dist/parser.js:40:17)
    at Socket.<anonymous> (~/node_modules/pg-protocol/dist/index.js:11:42)
    at Socket.emit (node:events:513:28)
    at addChunk (node:internal/streams/readable:324:12)
    at readableAddChunk (node:internal/streams/readable:297:9)
    at Readable.push (node:internal/streams/readable:234:10)
    at TCP.onStreamRead (node:internal/stream_base_commons:190:23)

Other PostgreSQL versions

I've now tested and recreated this issue with postgres versions:

However, it does NOT occur on:

alxndrsn commented 1 year ago

Tested with:

(node:781) UnhandledPromiseRejectionWarning: Error: Unexpected res: []
    at ~/db-test.js:35:15
    at processTicksAndRejections (internal/process/task_queues.js:95:5)
(Use `node --trace-warnings ...` to show where the warning was created)
(node:781) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag `--unhandled-rejections=strict` (see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 1)
(node:781) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.
~/node_modules/pg/lib/client.js:340
    this.activeQuery.handleRowDescription(msg)
                     ^

TypeError: Cannot read property 'handleRowDescription' of null
    at Client._handleRowDescription (~/node_modules/pg/lib/client.js:340:22)
    at Connection.emit (events.js:400:28)
    at ~/node_modules/pg/lib/connection.js:115:12
    at Parser.parse (~/node_modules/pg-protocol/dist/parser.js:40:17)
    at Socket.<anonymous> (~/node_modules/pg-protocol/dist/index.js:10:42)
    at Socket.emit (events.js:400:28)
    at addChunk (internal/streams/readable.js:293:12)
    at readableAddChunk (internal/streams/readable.js:267:9)
    at Socket.Readable.push (internal/streams/readable.js:206:10)
    at TCP.onStreamRead (internal/stream_base_commons.js:188:23)
nathanjcochran commented 1 year ago

I believe I am hitting this issue as well. In my case, I am trying to cancel a QueryStream query with pg_cancel_backend(), close the stream, and then issue another query on the same connection.

Here's a reproduction (pg: 8.11.0, pg-query-stream: 4.5.0, node 18.9.0, postgres: 13.6):

const pg = require('pg');
const QueryStream = require('pg-query-stream');

(async () => {
  const pool = new pg.Pool();
  pool.on('error', (err) => {
    console.error('Pool error: ', err);
  });

  const conn = await pool.connect();
  conn.on('error', (err) => {
    console.error('Connection error: ', err);
  });

  const result = await conn.query('SELECT pg_backend_pid() AS pid;');
  const { pid } = result.rows[0];

  const stream = conn.query(new QueryStream('SELECT pg_sleep(10);'));
  stream.on('data', (chunk) => {
    console.log('stream data: ', chunk);
  });
  stream.on('error', (err) => {
    console.error('stream error: ', err);
  });

  // Create a promise that is resolved when the stream is closed
  const closed = new Promise((res) => {
    stream.on('close', res);
  });

  // Wait 1 second before cancelling the query
  await new Promise((res) => {
    setTimeout(() => res(), 1000);
  });

  await pool.query('SELECT pg_cancel_backend($1);', [pid]);

  stream.destroy();
  await closed;

  // Triggers uncaught exception
  await conn.query('SELECT pg_sleep(1);');

  conn.release()
  await pool.end()
})();

This sometimes results in the following uncaught exception (it appears to be a bit of a race condition):

TypeError: Cannot read properties of null (reading 'handleRowDescription')
    at Client._handleRowDescription (~/node_modules/pg/lib/client.js:362:22)
    at Connection.emit (node:events:513:28)
    at ~/node_modules/pg/lib/connection.js:117:12
    at Parser.parse (~/node_modules/pg-protocol/dist/parser.js:40:17)
    at Socket.<anonymous> (~/node_modules/pg-protocol/dist/index.js:11:42)
    at Socket.emit (node:events:513:28)
    at addChunk (node:internal/streams/readable:324:12)
    at readableAddChunk (node:internal/streams/readable:297:9)
    at Readable.push (node:internal/streams/readable:234:10)
    at TCP.onStreamRead (node:internal/stream_base_commons:190:23)

Waiting for the stream to be closed should theoretically be a reliable signal that the connection is ready to be used again, since the stream isn't closed until a ReadyForQuery message is received - see here and here. However, I believe the ReadyForQuery message is being sent multiple times, which means that even if we wait for it before issuing another query on the connection, it can happen again, which breaks the state machine if another query has already been issued on the connection. Specifically, ReadyForQuery causes the activeQuery to be set to null here. But if we already issued a second query on the connection before that happens, then when the RowDescription message is returned for the second query, it causes the TypeError: Cannot read properties of null (reading 'handleRowDescription') error here, which crashes the process.

I think the multiple ReadyForQuery messages have to do with the fact that I am both destroying the query stream (which I think triggers one Sync), as well as issuing a pg_cancel_backend() call, which causes the connection to error and thereby triggers a second Sync. The race condition has to do with which of those happens first (if the error message happens first, only one Sync is triggered due to this check, but if the error happens second, then two Sync messages are triggered, which causes the issue).

Given that, I believe https://github.com/brianc/node-postgres/pull/2836 (which looks like it would prevent the second Sync from the error message if the Sync from closing the stream had already been fired) should fix the issue for me. Would love to see that get merged!

EDIT: Just tested #2836, and it does indeed fix the issue 🎉.

lognaturel commented 5 months ago

Consider closing now that https://github.com/brianc/node-postgres/pull/2836 is merged. At least two consumers have confirmed that the issue is fixed.

brianc commented 5 months ago

Good point, thanks!