0.1.0 release is here 🎉 Please try it out!
pg_duckdb is a Postgres extension that embeds DuckDB's columnar-vectorized analytics engine and features into Postgres. We recommend using pg_duckdb to build high performance analytics and data-intensive applications.
pg_duckdb was developed in collaboration with our partners, Hydra and MotherDuck.
See our official documentation for further details.
SELECT
queries executed by the DuckDB engine can directly read Postgres tables. (If you only query Postgres tables you need to run SET duckdb.force_execution TO true
, see the IMPORTANT section above for details)
Read and Write support for object storage (AWS S3, Cloudflare R2, or Google GCS):
SELECT n FROM read_parquet('s3://bucket/file.parquet') AS (n int)
SELECT n FROM read_csv('s3://bucket/file.csv') AS (n int)
SELECT n FROM read_json('s3://bucket/file.json') AS (n int)
SELECT duckdb.install_extension('iceberg')
and read Iceberg files with iceberg_scan
.SELECT duckdb.install_extension('delta')
and read Delta files with delta_scan
.Write a query — or an entire table — to parquet in object storage.
COPY (SELECT foo, bar FROM baz) TO 's3://...'
COPY table TO 's3://...'
Read and write to Parquet format in a single query
COPY (
SELECT count(*), name
FROM read_parquet('s3://bucket/file.parquet') AS (name text)
GROUP BY name
ORDER BY count DESC
) TO 's3://bucket/results.parquet';
JOIN
data in object storage/MotherDuck with Postgres tables, views, and materialized views.CREATE TEMP TABLE ... USING duckdb
.SELECT duckdb.install_extension('extension_name');
SET duckdb.force_execution = true|false
SELECT duckdb.cache('path', 'type');
where
Docker images are available on Dockerhub and are based on the official Postgres image. Use of this image is the same as the Postgres image. For example, you can run the image directly:
docker run -d -e POSTGRES_PASSWORD=duckdb pgduckdb/pgduckdb:16-main
And with MotherDuck, you only need a a MotherDuck access token and then it is as simple as:
$ export MOTHERDUCK_TOKEN=<your personal MD token>
$ docker run -d -e POSTGRES_PASSWORD=duckdb -e MOTHERDUCK_TOKEN pgduckdb/pgduckdb:16-main -c duckdb.motherduck_enabled=true
Or you can use the docker compose in this repo:
git clone https://github.com/duckdb/pg_duckdb && cd pg_duckdb && docker compose up -d
Once started, connect to the database using psql:
psql postgres://postgres:duckdb@127.0.0.1:5432/postgres
# Or if using docker compose
docker compose exec db psql
For other usages see our Docker specific README.
Pre-built apt binaries are available via pgxman. After installation, you will need to add pg_duckdb to shared_preload_libraries
and create the extension.
pgxman install pg_duckdb
Note: due to the use of shared_preload_libraries
, pgxman's container support is not currently compatible with pg_duckdb.
To build pg_duckdb, you need:
To build and install, run:
make install
Add pg_duckdb
to the shared_preload_libraries
in your postgresql.conf
file:
shared_preload_libraries = 'pg_duckdb'
Next, create the pg_duckdb
extension:
CREATE EXTENSION pg_duckdb;
IMPORTANT: DuckDB execution is usually enabled automatically when needed. It's enabled whenever you use DuckDB functions (such as read_csv
), when you query DuckDB tables, and when running COPY table TO 's3://...'
. However, if you want queries which only touch Postgres tables to use DuckDB execution you need to run SET duckdb.force_execution TO true
'. This feature is opt-in to avoid breaking existing queries. To avoid doing that for every session, you can configure it for a certain user by doing ALTER USER my_analytics_user SET duckdb.force_execution TO true
.
See our official documentation for more usage information.
pg_duckdb relies on DuckDB's vectorized execution engine to read and write data to object storage bucket (AWS S3, Cloudflare R2, or Google GCS) and/or MotherDuck. The follow two sections describe how to get started with these destinations.
Querying data stored in Parquet, CSV, JSON, Iceberg and Delta format can be done with read_parquet
, read_csv
, read_json
, iceberg_scan
and delta_scan
respectively.
Add a credential to enable DuckDB's httpfs support.
-- Session Token is Optional
INSERT INTO duckdb.secrets
(type, key_id, secret, session_token, region)
VALUES ('S3', 'access_key_id', 'secret_access_key', 'session_token', 'us-east-1');
Copy data directly to your bucket - no ETL pipeline!
COPY (SELECT user_id, item_id, price, purchased_at FROM purchases)
TO 's3://your-bucket/purchases.parquet;
Perform analytics on your data.
SELECT SUM(price) AS total, item_id
FROM read_parquet('s3://your-bucket/purchases.parquet')
AS (price float, item_id int)
GROUP BY item_id
ORDER BY total DESC
LIMIT 100;
pg_duckdb also integrates with MotherDuck. To enable this support you first need to generate an access token and then add the following line to your postgresql.conf
file:
duckdb.motherduck_token = 'your_access_token'
NOTE: If you don't want to store the token in your postgresql.conf
file can also store the token in the motherduck_token
environment variable and then explicitly enable MotherDuck support in your postgresql.conf
file:
duckdb.motherduck_enabled = true
If you installed pg_duckdb
in a different Postgres database than the default one named postgres
, then you also need to add the following line to your postgresql.conf
file:
duckdb.motherduck_postgres_database = 'your_database_name'
After doing this (and possibly restarting Postgres). You can then you create tables in the MotherDuck database by using the duckdb
Table Access Method like this:
CREATE TABLE orders(id bigint, item text, price NUMERIC(10, 2)) USING duckdb;
CREATE TABLE users_md_copy USING duckdb AS SELECT * FROM users;
Any tables that you already had in MotherDuck are automatically available in Postgres. Since DuckDB and MotherDuck allow accessing multiple databases from a single connection and Postgres does not, we map database+schema in DuckDB to a schema name in Postgres.
This is done in the following way:
main
DuckDB schema in your default database, which is merged with the Postgres public
schema.ddb$<duckdb_db_name>$<duckdb_schema_name>
(including the literal $
characters).main
schema in those other databases. That schema should be accessed using the shorter name ddb$<db_name>
instead.An example of each of these cases is shown below:
INSERT INTO my_table VALUES (1, 'abc'); -- inserts into my_db.main.my_table
INSERT INTO your_schema.tab1 VALUES (1, 'abc'); -- inserts into my_db.your_schema.tab1
SELECT COUNT(*) FROM ddb$my_shared_db.aggregated_order_data; -- reads from my_shared_db.main.aggregated_order_data
SELECT COUNT(*) FROM ddb$sample_data$hn.hacker_news; -- reads from sample_data.hn.hacker_news
Please see the project milestones for upcoming planned tasks and features.
pg_duckdb was developed in collaboration with our partners, Hydra and MotherDuck. We look forward to their continued contributions and leadership.
Hydra is a Y Combinator-backed database company, focused on DuckDB-Powered Postgres for app developers.
MotherDuck is the cloud-based data warehouse that extends the power of DuckDB.
We welcome all contributions big and small: