brianc / node-pg-copy-streams

COPY FROM / COPY TO for node-postgres. Stream from one database to another, and stuff.
333 stars 40 forks source link

Row count always off by one on CopyToStreamQuery #151

Closed SansegoTek closed 1 year ago

SansegoTek commented 1 year ago

version 6.0.5

await sourceClient.connect();
await targetClient.connect();

const sourceQuery = `
copy (
      select <from view joined to table>)
    to stdout with binary`;

const targetQuery = `
    copy <table>(<columns>)
    from stdin with binary`;

const copyTo = to(sourceQuery);
const copyFrom = from(targetQuery);

const sourceStream = sourceClient.query(copyTo);
const targetStream = targetClient.query(copyFrom);

await pipeline(sourceStream, targetStream);

console.log(copyTo.rowCount);
console.log(copyFrom.rowCount);

regardless of the number of rows copied (even when 0 rows) my copyTo.rowCount is one greater than my copyFrom.rowCount. The copyFrom rowCount is correct.

True whether with binary is specified or not

SansegoTek commented 1 year ago

I should add, Postgres 14.6

jeromew commented 1 year ago

Hello,

I must admit that this is and old feature that I never used so it is possible that the implementation has a problem.

I checked how rowCount is calculated

According to https://www.postgresql.org/docs/current/protocol-flow.html, in copy-to

Copy-out mode (data transfer from the server) is initiated when the backend executes a COPY TO STDOUT SQL statement. The backend sends a CopyOutResponse message to the frontend, followed by zero or more CopyData messages (always one per row), followed by CopyDone. The backend then reverts to the command-processing mode it was in before the COPY started, and sends CommandComplete. The frontend cannot abort the transfer (except by closing the connection or issuing a Cancel request), but it can discard unwanted CopyData and CopyDone messages.

so at this stage, I don't see where the problem comes from, be it in the library or in the specific use case your are trying or a misunderstanting of the postgres protocol. Tell me if you see something, I will try to find some time in the next days to dig deeper.

the "0 rows" case seems like a good candidate that I will try to reproduce.

jeromew commented 1 year ago

Also note that according to https://www.postgresql.org/docs/current/protocol-message-formats.html

CommandComplete (B) - For a COPY command, the tag is COPY rows where rows is the number of rows copied. (Note: the row count appears only in PostgreSQL 8.2 and later.)

So if that is valid for both copy-in and copy-out, it could be a better strategy to use the same strategy in both copy-from and copy-to, and extract the rowCount value from the CommandComplete message in copy-to

This needs to be checked.

jeromew commented 1 year ago

@SansegoTek I looked at the current tests and realized that rowCount is already tested so this adds to my puzzlement.

copy-from :

it('correctly handle more heavy scenario', (done) => {
    const top = 130000
    const chunks = []
    const expected = []
    for (let i = 0; i < top; i++) {
      chunks.push(Buffer.from('' + i + '\t' + i * 10 + '\n'))
      expected.push([i, i * 10])
    }

    assertCopyFromResult('numbers', '(num1 int, num2 int)', chunks, (err, rows, stream) => {
      assert.deepStrictEqual(rows, expected, 'not matched')
      assert.equal(stream.rowCount, top, 'should have rowCount ' + top + ' ')
      done()
    })
  }).timeout(120000)

copy-to:

it('provides row count', (done) => {
      const top = 100
      const sql = 'COPY (SELECT * from generate_series(0, ' + (top - 1) + ')) TO STDOUT'
      assertCopyToResult(sql, (err, chunks, result, stream) => {
        assert.ifError(err)
        assert.equal(stream.rowCount, top, 'should have rowCount ' + top + ' but got ' + stream.rowCount)
        done()
      })
    })

Could you create a test that highlights what you are observing on empty sets ?

jeromew commented 1 year ago

OK I can now see something that looks like your problem :

when copyTo is used with a query that is in binary mode, there is a trailer

File Trailer The file trailer consists of a 16-bit integer word containing -1. This is easily distinguished from a tuple's field-count word.

This trailer is sent as a copyData message in the COPY protocol and as such adds 1 to the rowCount in the current implementation.

The only discrepency with your report is that you seem to say that you also have the rowCount difference when you do not use the BINARY protocol.

True whether with binary is specified or not

That seems weird to me. When I remove the BINARY protocol in my test, the rowCounts are equal.

SansegoTek commented 1 year ago

My apologies, you're correct - when running without the BINARY, the row counts are equal. I'm using typescript, must have forgotten to build before testing without the WITH BINARY.

I'm just subtracting 1 from the row count at the moment, so it's good to know this is a solid approach. I can update if/when you make a fix.

Thanks!

jeromew commented 1 year ago

@SansegoTek I published 6.0.6 that extracts the rowCount from the message that postgres sends at the end of the command. Can you tell me if it works for you ?

SansegoTek commented 1 year ago

Perfect, many thanks :)