duckdb / postgres_scanner

https://duckdb.org/docs/extensions/postgres
MIT License
227 stars 36 forks source link

CREATE TABLE with subqueries on PostgreSQL database errors or hangs permanently #146

Closed thomas-daniels closed 9 months ago

thomas-daniels commented 9 months ago

DuckDB 0.9.2, PostgreSQL 16 (also found on 15), latest version of the postgres_scanner extension, macOS 13.6.2. Since this issue started appearing very recently, perhaps it may be related to #142?

Queries of the form CREATE TABLE postgres_db.foo as SELECT * FROM bar WHERE <subquery involving other PostgreSQL table> no longer seem to work.

MCVE (using CLI, but has also been reproduced in the Python client):

load postgres;
attach 'CONNECTION STRING HERE' as postgres_db (type postgres);
create table postgres_db.public.test_table_1 (id int);
create table postgres_db.public.test_table_2 as select * from values ( (1) ) as V(id) where id not in (select id from postgres_db.public.test_table_1);

(The where id not in ... part is important here: the query works fine if the where-clause is removed.)

The table test_table_1 gets created fine (in fact it doesn't matter if this table is created by DuckDB or already exists, but its creation is included in the commands here so the example is complete). However, test_table_2 is not. Instead of the table being created, DuckDB either hangs permanently or throws an error. Permanently hanging seems to be the most common result, but I've also observed different types of errors on queries of this form, of these forms:

Error: Invalid Error: Failed to execute query "BEGIN TRANSACTION;
CREATE TABLE <name> (<columns>);"

or

Error: Invalid Error: Failed to prepare COPY "COPY <name> FROM STDIN (FORMAT BINARY)": 
ERROR:  COPY from stdin failed: COPY terminated by new PQexec

or

Error: Invalid Error: Failed to execute query "CREATE TABLE <name> (<columns>);": message contents do not agree with length in message type "H"
lost synchronization with server: got message type "C", length 1330665760

The same query would sometimes lead to a different error message, or a permanent hang instead of an error message.

While it's not consistent how these queries fail, it does appear to be consistent that they fail in some way. I haven't been able to run a successful query of this form.

Mytherin commented 9 months ago

Thanks for the report! I've pushed a fix in #147. The issue is that the CREATE TABLE AS and SELECT are run in the same Postgres connection concurrently which causes odd behavior. This issue was actually present before but https://github.com/duckdb/postgres_scanner/pull/142 makes it occur in more scenarios because the connections are re-used more aggressively. The solution is to finish running the SELECT before we run CREATE TABLE AS, which we already did for other operators (e.g. INSERT).