duckdb / postgres_scanner

https://duckdb.org/docs/extensions/postgres
MIT License
227 stars 36 forks source link

Parallel COPY to Postgres ? #185

Open anayrat opened 7 months ago

anayrat commented 7 months ago

What happens?

Hello, I tried to do a parallel import in Postgres. I wonder if postgres_scanner allows this ?

To Reproduce

I tried :

ATTACH 'user=postgres host=/var/run/postgresql port=5440' AS db (TYPE postgres);
COPY db.votes2 FROM 'data.zstd/*' (FORMAT parquet);

I tried to change file format (csv, parquet), compression, multiples files... I also tried to first import in duckdb, then, copy to Postgres. I only see one COPY process on Postgres.

OS:

Linux

PostgreSQL Version:

16

DuckDB Version:

v0.10.0 20b1486d11

DuckDB Client:

CLI

Full Name:

Adrien Nayrat

Affiliation:

Self Employed

Have you tried this on the latest main branch?

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

Mytherin commented 7 months ago

Thanks for the report! Parallel imports are not supported right now. It is also rather hard (and perhaps not even possible?) to support parallel imports in a transaction-safe manner. We could perhaps offer a bulk-import option that does not respect transactions in the future.

anayrat commented 7 months ago

Thanks for your quick reply ! Indeed, it is a good idea for the bulk import without taking care of transaction. FYI, it seems possible to perform the copy thanks to snapshot export :

Session 1 Session 2
BEGIN;
SELECT pg_export_snapshot();

pg_export_snapshot
---------------------
00000003-00000016-1
(1 row)

copy t1 from 'file.copy' ;
COPY 10
BEGIN ISOLATION LEVEL REPEATABLE READ;
SET TRANSACTION SNAPSHOT '00000003-00000016-1';
copy t1 from 'file.copy' ;
COPY 10
COMMIT;
COMMIT;
select count(*) from votes2;
count
-------
20
(1 row)
anayrat commented 6 months ago

With such feature, DBT could be a very good tool to load massive amount of data in Postgres. It could be my new favorite ELT/ETL :)