brianc / node-postgres

PostgreSQL client for node.js.
https://node-postgres.com
MIT License
12.19k stars 1.22k forks source link

TypeError: Cannot read property 'handleCommandComplete' of null #2456

Closed amotl closed 3 years ago

amotl commented 3 years ago

Hi Brian,

coming from #2454, check-query-order-node-postgres.js is a repro in order to trigger an async/timing issue with node-postgres. It eventually reveals a problem with in-order statement execution when using parameterized statements on the pure-JS driver variant.

The documentation how to run it is inlined into the program (check-query-order-node-postgres.js --parameterized). As I've mentioned within #2454, the problem does not happen when using pg-native which you can conveniently enable by using the --native command line option.

From running it, you might be able to see that - while the SQL statements are submitted in order and the awaiting of statements is hopefully correctly implemented - asking for data which has just been inserted into the database will yield an empty result. From inquiring the sys.jobs_log table afterwards, we can see that the statements are not executed in order. The right order should obviously be DROP/CREATE, INSERT, REFRESH, SELECT.

Result:
undefined

sys.jobs_log:
[
  { stmt: 'DROP TABLE IF EXISTS testdrive;' },
  { stmt: ' CREATE TABLE testdrive (foobar INT, bazqux INT);' },
  { stmt: 'REFRESH TABLE testdrive;' },
  { stmt: 'SELECT * FROM testdrive;' },
  { stmt: 'INSERT INTO testdrive (foobar, bazqux) VALUES (42, $1);' }
]

When executing the repro just once, we have only been able to reproduce the issue on a freshly started instance of CrateDB. That's probably because it initially responds a tick slower than after being warmed up. All subsequent invocations succeeded, that's why I first thought it would be a regression on CrateDB. However, the issue can be revealed again when running check-query-order-node-postgres.js in a loop by using the watch utility as outlined within the inline documentation.

To be absolutely safe, I accompanied the repro with a respective example program using Python/asyncpg check-query-order-python-asyncpg.py, which does not reveal any flaw.

Below, I've shared details of the tracebacks. The issue has already been discovered by @Page- and reported through #1872, @johan13, @laurisvan and @emilsedgh also have been able to confirm this to happen occasionally on their environments.

Thanks already for looking into this!

With kind regards, Andreas.

cc @sroze, @robobr, @charmander, @ceejbot


Besides the statements not being in order and receiving an empty result (see stdout output above), this exception also happens occasionally.

TypeError: Cannot read property 'handleCommandComplete' of null ```javascript /path/to/node_modules/pg/lib/client.js:360 this.activeQuery.handleCommandComplete(msg, this.connection) ^ TypeError: Cannot read property 'handleCommandComplete' of null at Client._handleCommandComplete (/path/to/node_modules/pg/lib/client.js:360:22) at Connection.emit (events.js:315:20) at /path/to/node_modules/pg/lib/connection.js:115:12 at Parser.parse (/path/to/node_modules/pg-protocol/dist/parser.js:40:17) at Socket. (/path/to/node_modules/pg-protocol/dist/index.js:10:42) at Socket.emit (events.js:315:20) at addChunk (_stream_readable.js:309:12) at readableAddChunk (_stream_readable.js:284:9) at Socket.Readable.push (_stream_readable.js:223:10) at TCP.onStreamRead (internal/stream_base_commons.js:188:23) ```

Occasionally, the repro will yield the following traceback. While it does not originate from within node-postgres but rather from our repro program, it is also coming from a not-in-order execution of the queries as it clearly indicates that the sys.jobs_log table has either been queried before any other statements could have been recorded or, alternatively, the response handling of node-postgres has a flaw.

Error: sys.jobs_log has no content ```javascript Trace: Error: sys.jobs_log has no content at /path/to/crate-qa/tests/client_tests/node-postgres/check-query-order-node-postgres.js:71:13 at processTicksAndRejections (internal/process/task_queues.js:93:5) at async workload (/path/to/crate-qa/tests/client_tests/node-postgres/check-query-order-node-postgres.js:69:3) at /path/to/crate-qa/tests/client_tests/node-postgres/check-query-order-node-postgres.js:111:13 at processTicksAndRejections (internal/process/task_queues.js:93:5) ```
amotl commented 3 years ago

What @emilsedgh figured out and reported at https://github.com/brianc/node-postgres/issues/1872#issuecomment-720065853 about pg-protocol's Parser.parse() method sounds sensible.

https://github.com/brianc/node-postgres/blob/1d3f155d4ffa5ac4200cfcc8ceb4d338790e5556/packages/pg-protocol/src/parser.ts#L102

However, I am not any way into any of the details which would permit me to have any judgement on this. Specifically if that potential flaw only reveals a problem with respect to error handling or if it also might be responsible for the statements being submitted out of order. Just FYI.

amotl commented 3 years ago

Dear Brian,

thank you very much for your prompt response and your investigations at https://github.com/brianc/node-postgres/issues/2455#issuecomment-769345077 which revealed a problem to us, where CrateDB was not responding correctly to PostgreSQL wire protocol requests. Sorry for the noise.

@mfussenegger (also thanks a stack!) added a mitigation on behalf of https://github.com/crate/crate/pull/10979:

If the flush triggers the deferred executions, then a follow up sync() returned a completed future. This led the pg-write layer to immediately send a ReadyForQuery message to the client, even if the operations triggered by the flush weren't finished.

This patch improves the logic so that flush sets the activeExecution property, which will then be used by sync. This ensures that the ReadyForQuery message is sent after the operations have completed.

I have been able to confirm that the repros now work flawlessly using the most recent nightly release of CrateDB. The fix will be part of the upcoming GA release 4.4.1.

With kind regards, Andreas.

amotl commented 3 years ago

So, what @emilsedgh figured out and reported at https://github.com/brianc/node-postgres/issues/1872#issuecomment-720065853 about pg-protocol's Parser.parse() method, is probably a different kind of story and might be followed up independently. Sorry Emil!