duckdb / postgres_scanner

https://duckdb.org/docs/extensions/postgres
MIT License
227 stars 36 forks source link

Expected Postgres binary COPY header, got something else #156

Closed davetapley closed 8 months ago

davetapley commented 9 months ago

What happens?

I'm switching to from duckdb 0.9.1 to 0.9.2 and so getting https://github.com/duckdb/duckdb/pull/9504

With deprecated postgres_attach I run in to #154, so I tried the new ATTACH syntax, but it fails with:

Error: near line 122: IO Error: Expected Postgres binary COPY header, got something else

To Reproduce

Here's example of change I made to my script: https://gist.github.com/davetapley/cfaaf7a7d6eda0079068d1f7776eb358/revisions

I presume it's correct to only use the legacy. DB prefix in the FROM and JOIN statements (and not in the SELECT)?

OS:

Ubuntu 18.04.2 LTS

PostgreSQL Version:

10.23

DuckDB Version:

0.9.2

DuckDB Client:

CLI

Full Name:

Dave Tapley

Affiliation:

JE Fuller

Have you tried this on the latest main branch?

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

davetapley commented 9 months ago

Not sure if related or different issue, but I just tried to repro this on another machine and on that I'm seeing:

Error: near line 37: Invalid Error: Failed to prepare COPY "
        COPY (SELECT "IDX", "DESCRIP" FROM "public"."SuppProtocolTypes" WHERE ctid BETWEEN '(0,0)'::tid AND '(4294967295,0)'::tid) TO STDOUT (FORMAT binary);
        ": 

I wondered if it was the join causing trouble, so I tried to make an MRE on a new DB, but couldn't reproduce.

Mytherin commented 9 months ago

Thanks for the report! Could you run FORCE INSTALL postgres_scanner to ensure you are using the latest version of the extension? Does the error persist if you use SET threads=1 and/or SET pg_connection_limit=1?

Could you also provide the schema and data of the Postgres database required to reproduce the issue?

davetapley commented 9 months ago

@Mytherin okay, here we go:


On server throwing Expected Postgres binary COPY header:

FORCE INSTALL postgres_scanner changed error message to:

Error: near line 42: IO Error: Unable to read binary COPY data from Postgres: message contents do not agree with length in message type "H"
lost synchronization with server: got message type "

SET threads=1; and no error ✅

SET pg_connection_limit=1 changed error from IO Error to Failed to prepare COPY.


On server throwing Failed to prepare COPY since the start:

FORCE INSTALL postgres_scanner didn't change anything.

SET threads=1; and no error ✅

SET pg_connection_limit=1; and intermittently:

Either same error, no error (i.e. it works), or a new one: ðŸĪŠ

Error: near line 42: IO Error: Unable to read binary COPY data from Postgres: no COPY in progress

Unfortunately it's a big DB (10s GB), I'll see if I can repro with something smaller 🙏ðŸŧ

davetapley commented 8 months ago

I could not get a reliable repro 😞

Mytherin commented 8 months ago

Thanks for trying to get a repro! I have a suspicion that this might be related to the connection cache introduced in #142. In #170 I'm moving the connection cache behind a setting (pg_experimental_connection_cache) and disabling it by default. It would be great if you could test it out again after that is merged to see if that resolves the issue.

Mytherin commented 8 months ago

Should be available now with force install postgres_scanner from 'http://nightly-extensions.duckdb.org'; if you could give it a spin

hazelgurule commented 8 months ago

@Mytherin thanks for trying that out!

I tried running force install postgres_scanner from 'http://nightly-extensions.duckdb.org'; and I still get the error, unfortunately.

I think I've come up with a minimal repro, though! The error message is Unable to read binary COPY data from Postgres, invalid header, which is slightly different but I'm guessing it's a similar issue. Details here https://gist.github.com/hazelgurule/4d11a41ee222b84e71b57e5f28f9d2c6 - basically the issue seems to occur when joining 3 tables together

Mytherin commented 8 months ago

Thanks for the reproducible example! Somehow I missed this issue was about running multiple scans in the same query and joining them together. I've pushed a fix in #171.

Mytherin commented 8 months ago

The fixed version should now be available with force install postgres_scanner from 'http://nightly-extensions.duckdb.org'; if you want to verify that this is resolved.

hazelgurule commented 8 months ago

Thanks! I'm still seeing the issue unfortunately - after running force install I'm getting

D select * from mydatabase.test1 a left join mydatabase.test2 b on a.id = b.id left join mydatabase.test3 c on a.id = c.id;
Error: IO Error: Unable to read binary COPY data from Postgres, invalid header

and every once in a while I see a new error -

D select * from mydatabase.test1 a left join mydatabase.test2 b on a.id = b.id left join mydatabase.test3 c on a.id = c.id;
duckdb(58549,0x1e45cd000) malloc: *** error for object 0x600000bbe140: pointer being freed was not allocated
duckdb(58549,0x1e45cd000) malloc: *** set a breakpoint in malloc_error_break to debug

Not sure if there's a way to confirm that force install did its thing or to see exactly which version I'm now on.

Mytherin commented 8 months ago

Sounds to me like you might still be using the old version. Are you installing from the nightly bucket with v0.9.2 - i.e. force install postgres_scanner from 'http://nightly-extensions.duckdb.org'; on a version where select version() returns v0.9.2?

hazelgurule commented 8 months ago

Hmm I think I'm installing everything correctly...

➜  rm -rf ~/.duckdb/extensions
➜  ./scripts/duckdb
Loading DuckDB version 0.9.2...
v0.9.2 3c695d7ba9
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
D force install postgres_scanner from 'http://nightly-extensions.duckdb.org';
D select version()
> ;
┌───────────┐
│ version() │
│  varchar  │
├───────────â”Ī
│ v0.9.2    │
└───────────┘
D load postgres_scanner;
D ATTACH '<connection string>' as mydatabase (TYPE POSTGRES);
D select * from mydatabase.test1 a left join mydatabase.test2 b on a.id = b.id left join mydatabase.test3 c on a.id = c.id;
Error: IO Error: Unable to read binary COPY data from Postgres, invalid header
Mytherin commented 8 months ago

Hm, I think the old extension version was still cached in a CDN most likely. This works as expected for me:

D force install postgres_scanner;
D ATTACH 'dbname=postgresscanner' AS mydatabase (TYPE postgres);
D select * from mydatabase.test1 a left join mydatabase.test2 b on a.id = b.id left join mydatabase.test3 c on a.id = c.id;
┌───────┮───────┮───────┐
│  id   │  id   │  id   │
│ int32 │ int32 │ int32 │
├───────────────────────â”Ī
│        0 rows         │
└───────────────────────┘

I've published the new extension so that FORCE INSTALL postgres_scanner fetches it as well now.

hazelgurule commented 8 months ago

Ah, great - it's working for me now - thanks!!