brianc / node-postgres

PostgreSQL client for node.js.
https://node-postgres.com
MIT License
12.27k stars 1.22k forks source link

Cannot read property 'handleCommandComplete' of null #1872

Open Page- opened 5 years ago

Page- commented 5 years ago

I've hit the following error coming from node-postgres 7.9.0 (latest) which crashes the node process

TypeError: Cannot read property 'handleCommandComplete' of null
  File "/usr/src/app/node_modules/pg/lib/client.js", line 294, col 22, in Connection.<anonymous>
    self.activeQuery.handleCommandComplete(msg, con)
  File "events.js", line 189, col 13, in Connection.emit
  File "domain.js", line 441, col 20, in Connection.EventEmitter.emit
  File "/usr/src/app/node_modules/pg/lib/connection.js", line 125, col 12, in Socket.<anonymous>
    self.emit(eventName, msg)
  File "events.js", line 189, col 13, in Socket.emit
  File "domain.js", line 441, col 20, in Socket.EventEmitter.emit
  File "_stream_readable.js", line 284, col 12, in addChunk
  File "_stream_readable.js", line 265, col 11, in readableAddChunk
  File "_stream_readable.js", line 220, col 10, in Socket.Readable.push
  File "internal/stream_base_commons.js", line 94, col 17, in TCP.onStreamRead [as onread]

I'm using a connection pool via new pg.Pool(config) but as far as I'm aware no other modules that would interact with node-postgres internals at all

I haven't got a reproduction as I've only seen it coupled with a read replica that is falling behind the primary and causing Client has encountered a connection error and is not queryable and terminating connection due to conflict with recovery errors to be emitted and would expect that to be what's causing the activeQuery to be nulled. A simple fix would be to add a guard to https://github.com/brianc/node-postgres/blob/v7.9.0/lib/client.js#L293-L295 but I'm not sure if you'd accept that as a patch and if/where you would want to emit it as an event instead - if you could offer guidance on how you'd like to see this solved then I'm very happy to send a PR

sroze commented 4 years ago

Either way, it does make sense to implement these guards 👍

johan13 commented 4 years ago

FWIW, I regularly see a very similar-looking crash when running pg 7.17.1 with an AWS Aurora Serverless database:

/home/jlevin/dev/backend/node_modules/pg/lib/client.js:329
    self.activeQuery.handleCommandComplete(msg, con)
                     ^
TypeError: Cannot read property 'handleCommandComplete' of null
    at Connection.<anonymous> (/home/jlevin/dev/backend/node_modules/pg/lib/client.js:329:22)
    at Connection.emit (events.js:311:20)
    at Connection.EventEmitter.emit (domain.js:482:12)
    at Socket.<anonymous> (/home/jlevin/dev/backend/node_modules/pg/lib/connection.js:128:12)
    at Socket.emit (events.js:311:20)
    at Socket.EventEmitter.emit (domain.js:482:12)
    at addChunk (_stream_readable.js:294:12)
    at readableAddChunk (_stream_readable.js:275:11)
    at Socket.Readable.push (_stream_readable.js:209:10)
    at TCP.onStreamRead (internal/stream_base_commons.js:186:23)

I haven't got a waterproof way to reproduce it, but the app rarely runs for more than 5 or 10 minutes without crashing. The same app runs without issues on a non-Aurora postgres.

robobr commented 4 years ago

This feels like a reproducer but maybe I'm missing something obvious. If the query's rowCount is close to the stream's highWaterMark, and the async iterator breaks early, and the client is reused, then the type error is thrown. Not every time. More often when the highWaterMark is close to the result set row count.

Workaround is not reusing a client after it's used with a QueryStream.

Versions used: Node v10.17, Postgres 11.2, pg@8.0.0, pg-query-stream@3.0.4

const pg = require('pg');
const QueryStream = require('pg-query-stream');

async function main() {

  // result set has rowCount == highWaterMark
  const queryText = 'SELECT * FROM generate_series(0, 100) numb';

  // default highWaterMark is 100
  const stream = new QueryStream(queryText, [] );
  const client = new pg.Client();
  await client.connect();
  const query = client.query(stream);

  for await (const row of query) {
    if (row.numb > 50)
      break;
  }

  const now = await client.query('select now()');
  console.log( now.rows );
  const ce = await client.end();
}

main();
charmander commented 4 years ago

I think the issue there might be not closing the QueryStream, although if it is, the error message and amount of error checking can definitely be improved.

robobr commented 4 years ago

I had the same idea and added stream.destroy() just after the iterator loop. But it made no difference. I did not verify but I think the break inside the async iterator or any exit of that block destroys the stream.

laurisvan commented 4 years ago

I am not sure if this helps reproducing the issue, but I am hitting these with a connection that is run through WireGuard. What I generally have seen with the WireGuarded connections is that there are cases where the connection breaks and a few packets get dropped. These did not exist when I ran it over SSH tunnel - dunno if it has something to do that WireGuard uses UDP while I believe SSH runs TCP.

It seems that for me, it appears together with query_timeout: 30 * 1000 parameter and goes away when disabling it.

Probably even with SSH the connection drops, but that signal probably arrives a lot later, and hence the pg client has had enough time to prepare to recover (or something similar).

emilsedgh commented 3 years ago

I have a reliable way of reproducing this in my app. Although I haven't managed to capture it in a small script yet.

My app runs the following queries:

  1. SET application_name = 'foo';
  2. START TRANSACTION
  3. ROLLBACK

It seems that the way node-pg works is that once a query is completed, pg-protocol will issue a CommandCompleted message. node-pg will shift the connection.activeQuery once a CommandComplete message is received.

Now, this issue is happening because I run 3 commands, but 4 CommandComplete messages are being issued by pg-protocol.

Basically this is what happens:

1. Executes query `SET application_name = 'foo'`
2. Receives ack from server. Issues CommandComplete for `SET`

3. Executes query `START TRANSACTION`
4. Receives ack from server. Issues CommandComplete for `START TRANSACTION`

5. Executes query `ROLLBACK`
6. Receives ack from server. Issues CommandComplete for `START TRANSACTION`
7. Receives ack from server. Issues CommandComplete for `ROLLBACK`

The problem is in step 6. CommandComplete for START TRANSACTION is issues again, erroneously.

Im still investigating to understand what's happening, but it seems that the buffer is not correctly emptied after step 4.

I still haven't managed to capture this in a small script, which means that my application is triggering it. Maybe my app is doing something wrong or just triggering a corner case. @brianc let me know if im on the right path here or if there's any way i can help moving this conversation forward.

emilsedgh commented 3 years ago

Just to add a little bit more clues just in case anyone is following, this is the content of this.buffer on Parser when the bogus CommandComplete is generated:

Hex: 43000000165354415254205452414e53414354494f4e005a0000000554430000000d524f4c4c4241434b005a000000054900000000000000c8dd

UTF-8: ROLLBACKZI��LEPACKET CSTART TRANSACTIONZTC

emilsedgh commented 3 years ago

Alright, I think I figured it out, at least in my case.

This script can reproduce the issue easily:

const pg = require('pg')
var domain = require('domain')

var d = domain.create()

d.on('error', function(err) {
  console.error(err);
})

const pool = new pg.Pool({
  connectionString: process.argv[2]
})

d.run(() => {
  pool.connect((err, conn) => {
    conn.query('START TRANSACTION', () => {
      conn.query('ROLLBACK')
      throw 'foo'
    })
  })
})

Usage: node reproduce.js postgres://user:password@localhst/database

The problem is here

pg-protocol will call the handler back as soon as it receives a message. Before clearing it's internal buffer. Now, if your code throws an error in the callback in the same tick, that means this tick will never complete. Therefore, lines 103-117, which are responsible for clearing the buffer, will never be executed.

Now, if you have error handlers (like a domain handler in this sample script), that means the server will continue interacting with this connection in next ticks.

But since the internal queue of node-pg is not cleared out, the next time you query, pg-protocol's internal buffer still holds the previous buffer which was never cleared. And that means your callback will be called for the first query again.

@brianc A simple solution would be to call callback using a setImmediate. What do you think?

matthieusieben commented 3 years ago

@robobr you error looks similar to this https://github.com/brianc/node-postgres/issues/2119 Could this issue also be linked to that other issue ?

georgehazan commented 2 years ago

as for me the problem is still here... pg 8.7.3, Postgres 14.2, Node.js 16.10 it happens randomly, but quite often

matthieusieben commented 2 years ago

Edit: not relevant, sorry

georgehazan commented 2 years ago

@matthieusieben no, I reverted pg to 7.4.1, as somebody adviced in another place, works ok for now