brianc / node-pg-copy-streams

COPY FROM / COPY TO for node-postgres. Stream from one database to another, and stuff.
331 stars 40 forks source link

_final is sometimes called before submit #112

Closed gabegorelick closed 4 years ago

gabegorelick commented 4 years ago

I haven't been able to get a minimal test case yet, but when submitting multiple COPY FROM queries in parallel on the same connection, I often see the following error:

TypeError: Cannot read property 'stream' of undefined
    at CopyStreamQuery._final (node_modules/pg-copy-streams/copy-from.js:70:21)
    at callFinal (_stream_writable.js:617:10)
    at process._tickCallback (internal/process/next_tick.js:63:19)

I've tracked it down to the fact that _final is called before submit. Since submit is what sets this.connection, if it's not called then this.connection is undefined.

Since node-postgres is queuing these queries, what stops the input stream (the one piped to copy-from) from finishing before our COPY query becomes active?

Relevant code: https://github.com/brianc/node-pg-copy-streams/blob/83e0eb6e59a27972c01c121c685b0ddc822b8658/copy-from.js#L21-L24

https://github.com/brianc/node-pg-copy-streams/blob/83e0eb6e59a27972c01c121c685b0ddc822b8658/copy-from.js#L42-L49

jeromew commented 4 years ago

Hello thanks for your report on this.

I know that we delay the callback of _final (cf cb_flush) in order to avoid a problem when pg will dequeue the next query.

now for a very small input source, the whole source could indeed be swallowed by the internal buffers of the writableStream before pg dequeues the COPY query.

I would have thought that the stream beeing corked in the constructor was sufficient to handle this (it is uncorked later when CopyInResponse is received so pg has called submit already)

Your report seems to show that this may not be an effective strategy in all cases.

The documentation indeed states that

The writable.cork() method forces all written data to be buffered in memory. The buffered data will be flushed when either the stream.uncork() or stream.end() methods are called.

so if stream.end() is called on the source before pg could submit the query there might be a problem I suppose we need a test pinpointing this.

We need to keep a _final implementation because it is needed to delay the finish event until the COPY operation is done ; but we could probably improve the sequence of things to handle this corner case.

jeromew commented 4 years ago

OK it seems that the delaying of the _write and _writev callbacks works as intended (any first write to write or writev will be acknowledged only after CopyInResponse is received, meaning that pg has submitted the query).

There is only one case where I could reproduce your issue, which is when the source is totally empty. In this case, _write and _writev are never called (thus not delaying the call to _final), and the call to _final breaks.

This breaks _final with or without the query beeing queued by pg.

now my question is : do you thinks it matches the issue you are experiencing ? do you sometimes have empty sources in your sequence of copyFrom operations ?

jeromew commented 4 years ago

Could you please try version 5.1.1 that fixes an issue with empty sources

gabegorelick commented 4 years ago

I can confirm that https://github.com/brianc/node-pg-copy-streams/commit/97febfcc2cb6226c5787cb0a73377926d02c19ba fixes my issue.

Thanks @jeromew!