TypeError: Cannot read property 'name' of undefined

amotl commented 3 years ago

Hi Brian,

coming from #2454, cratedb-nodejs-parameterized-returning.js is a repro in order to trigger an async/timing issue with node-postgres. It eventually reveals a problem with parameterized statements in conjunction with a RETURNING clause on the pure-JS driver variant, thus indicating a flaw within node-postgress response handling.

The documentation how to run it is inlined into the program (cratedb-nodejs-parameterized-returning.js --parameterized --returning). When adding the --improve command line option, the situation improves slightly but obviously this is not a solution. As I've mentioned within #2454, the problem does not happen when using pg-native which you can conveniently enable by using the --native command line option.

Below, I've shared details of the tracebacks. The first traceback has already been discovered by @ugate and reported through #2168.

Thanks already for looking into this!

With kind regards, Andreas.

cc @vitaly-t

Mostly, this repro yields:

TypeError: Cannot read property 'name' of undefined

```javascript Trace: TypeError: Cannot read property 'name' of undefined at Result.parseRow (/path/to/node_modules/pg/lib/result.js:66:34) at Query.handleDataRow (/path/to/node_modules/pg/lib/query.js:87:26) at Client._handleDataRow (/path/to/node_modules/pg/lib/client.js:345:22) at Connection.emit (events.js:315:20) at /path/to/node_modules/pg/lib/connection.js:115:12 at Parser.parse (/path/to/node_modules/pg-protocol/dist/parser.js:40:17) at Socket. (/path/to/node_modules/pg-protocol/dist/index.js:10:42) at Socket.emit (events.js:315:20) at addChunk (_stream_readable.js:309:12) at readableAddChunk (_stream_readable.js:284:9) at Socket.Readable.push (_stream_readable.js:223:10) at TCP.onStreamRead (internal/stream_base_commons.js:188:23) at /Users/amo/dev/crate/troubleshooting/npm-pg-001/cratedb-nodejs-parameterized-returning.js:160:13 at processTicksAndRejections (internal/process/task_queues.js:93:5) ```

Occasionally, it will also croak like:

TypeError: Cannot read property 'handleDataRow' of null

```javascript /path/to/node_modules/pg/lib/client.js:345 this.activeQuery.handleDataRow(msg) ^ TypeError: Cannot read property 'handleDataRow' of null at Client._handleDataRow (/path/to/node_modules/pg/lib/client.js:345:22) at Connection.emit (events.js:315:20) at /path/to/node_modules/pg/lib/connection.js:115:12 at Parser.parse (/path/to/node_modules/pg-protocol/dist/parser.js:40:17) at Socket. (/path/to/node_modules/pg-protocol/dist/index.js:10:42) at Socket.emit (events.js:315:20) at addChunk (_stream_readable.js:309:12) at readableAddChunk (_stream_readable.js:284:9) at Socket.Readable.push (_stream_readable.js:223:10) at TCP.onStreamRead (internal/stream_base_commons.js:188:23) ```

brianc commented 3 years ago

Hey sorry for the delay in looking into this. Initial investigation is showing me the cratedb backend is sending messages out of spec for the extended query protocol, at least in so far as I've interpreted it. Here's the message order I'm receiving back when I dispatch the insert statement:

parseComplete
bindComplete
noData
readyForQuery
commandComplete

The commandComplete message is being sent after the readyForQuery message, which isn't up to spec as far as I interpreted it. Particularly this part:

If Execute terminates before completing the execution of a portal (due to reaching a nonzero result-row count), it will send a PortalSuspended message; the appearance of this message tells the frontend that another Execute should be issued against the same portal to complete the operation. The CommandComplete message indicating completion of the source SQL command is not sent until the portal's execution is completed. Therefore, an Execute phase is always terminated by the appearance of exactly one of these messages: CommandComplete, EmptyQueryResponse (if the portal was created from an empty query string), ErrorResponse, or PortalSuspended.

At completion of each series of extended-query messages, the frontend should issue a Sync message. This parameterless message causes the backend to close the current transaction if it's not inside a BEGIN/COMMIT transaction block ("close" meaning to commit if no error, or roll back if error). Then a ReadyForQuery response is issued. The purpose of Sync is to provide a resynchronization point for error recovery. When an error is detected while processing any extended-query message, the backend issues ErrorResponse, then reads and discards messages until a Sync is reached, then issues ReadyForQuery and returns to normal message processing. (But note that no skipping occurs if an error is detected while processing Sync — this ensures that there is one and only one ReadyForQuery sent for each Sync.)

Here's the full message from from the client front-end and the backend responses I'm seeing:

--- sending parameterized insert query ---
-> parse
-> bind
-> describe
-> execute
-> sync
<- parseComplete
<- bindComplete
<- noData
<- readyForQuery
<- commandComplete

Notice how readyForQuery is received before commandComplete is received? It's likely other drivers might be discarding readyForQuery messages until receiving commandComplete and then issuing a sync at that point, causing another readyForQuery to be sent down. That's something I could do here likely, but would require a significant change.

Contrast the message ordering from the backend when executing a non-parameterized query on cratedb:

-> query
<- rowDescription
<- dataRow
<- commandComplete
<- readyForQuery

I think your backend should not send readyForQuery before commandComplete - do you know why y'all are doing it that way?

brianc commented 3 years ago

Note: this out of order messaging would also be a cause for the other issue you reported: if you backend sends readyForQuery before commandComplete then the client will "finish" the active query. Once the query is finished the client releases any reference to it...if it receives an commandComplete after that...then it will crash on not being able to call handleCommandComplete of null (since there is no active query).

amotl commented 3 years ago

Dear Brian,

thank you so much for the detailed report. Of course, I want to deeply apologize if these aspects have not been covered by other test harnesses of CrateDB and would just have been revealed by your investigations. At the same time, I am really grateful for this excellent feedback. I will get back to our core engineers about that and then report back to you.

Your elaborations are very intriguing regarding that other drivers might ignore/discard readyForQuery messages until receiving commandComplete. That might well be a thing to investigate on them, but right now it's a different kind of story. However, that might well be the reason why we might not have discovered this flaw within other scenarios yet.

So, thanks again so far for investigating this so quickly.

With kind regards, Andreas.

cc @seut, @mfussenegger, @mkleen

brianc commented 3 years ago

I want to deeply apologize if these aspects have not been covered by other test harnesses of CrateDB and would just have been revealed by your investigations

absolutely no problemo at all, no apology needed!

At the same time, I am really grateful for this excellent feedback. I will get back to our core engineers about that and then report back to you.

great! I'm happy to debug & diagnose more too...or if you want an engineer to get on the phone or something with me just email me & we can set something up. I want to make sure cratedb has first class support with this driver all the way.

amotl commented 3 years ago

Hi Brian,

I want to make sure cratedb has first class support with this driver all the way.

Thanks a stack for your commitment!

You are definitively right that the commandComplete response message should be sent before the readyForQuery response message on all occasions.

In order to gather more insights, I followed @mfussenegger's suggestion to invoke

set global 'logger.io.crate.protocols.postgres' = 'TRACE';
set global 'logger.io.crate.action.sql' = 'TRACE';

to get some trace logs from CrateDB. I added my observations and a summary directly to the gist at https://gist.github.com/amotl/a79d685a1b77cd509ee38ada2e19c2a3#gistcomment-3612315 ff., which also compares the outcomes of the node-postgres implementation vs. the same implementation using asyncpg.

While I cannot evaluate these results, you might want to have a look at the observations already. We will probably continue looking into the details on Monday.

With kind regards, Andreas.

amotl commented 3 years ago

Dear Brian,

thank you very much for your prompt response and your investigations which revealed a problem to us, where CrateDB was not responding correctly to PostgreSQL wire protocol requests. Sorry for the noise.

@mfussenegger (also thanks a stack!) added a mitigation on behalf of https://github.com/crate/crate/pull/10979:

If the flush triggers the deferred executions, then a follow up sync() returned a completed future. This led the pg-write layer to immediately send a ReadyForQuery message to the client, even if the operations triggered by the flush weren't finished.

This patch improves the logic so that flush sets the activeExecution property, which will then be used by sync. This ensures that the ReadyForQuery message is sent after the operations have completed.

I have been able to confirm that the repros now work flawlessly using the most recent nightly release of CrateDB. The fix will be part of the upcoming GA release 4.4.1.

With kind regards, Andreas.

brianc / node-postgres

TypeError: Cannot read property 'name' of undefined #2455