alitrack / duckdb_fdw

DuckDB Foreign Data Wrapper for PostgreSQL
MIT License
346 stars 21 forks source link
duckdb duckdb-fdw fdw foreign-data-wrapper postgresql

DuckDB Foreign Data Wrapper for PostgreSQL

This is a foreign data wrapper (FDW) to connect PostgreSQL to DuckDB database file. This FDW works with PostgreSQL 9.6 ... 16 and works with exact same version of libduckdb.

PostgreSQL + DuckDB

Contents

  1. Features
  2. Supported platforms
  3. Installation
  4. Usage
  5. Functions
  6. Identifier case handling
  7. Generated columns
  8. Character set handling
  9. Examples
  10. Limitations
  11. Tests
  12. Contributing
  13. Useful links
  14. License

Features

Common features

Pushdowning

Notes about pushdowning

Notes about features

Also see Limitations

Supported platforms

duckdb_fdw was developed on macOS and tested on Linux, so it should run on any reasonably POSIX-compliant system.

Installation

Package installation

There's a duckdb_fdw rpm available on Pigsty's PGSQL yum repository for el8 and el9

Source installation

Prerequisites:

1. Download source

git clone https://github.com/alitrack/duckdb_fdw
cd duckdb_fdw

2. Download DuckDB library

For example, we want to compile under Linux AMD64 with DuckDB v1.0.0, just download libduckdb-linux-amd64.zip

wget -c https://github.com/duckdb/duckdb/releases/download/v1.0.0/libduckdb-linux-amd64.zip
unzip -d . libduckdb-linux-amd64.zip

# you can also put the libduckdb.so to a directory in LD_LIBRARY_PATH, such as /usr/lib64
cp libduckdb.so $(pg_config --libdir)

Beware that this libduckdb.so is build on ubuntu with higher glibc version, to use duckdb_fdw on el8 / el9, you have to compile [libduckdb-src.zip](https://github.com/duckdb/duckdb/releases/download/v1.0.0/libduckdb-src.zip) from source

3. Build and install duckdb_fdw

Add a directory of pg_config to PATH and build and install duckdb_fdw.

make USE_PGXS=1
make install USE_PGXS=1

If you want to build duckdb_fdw in a source tree of PostgreSQL, use

make
make install

Usage

CREATE SERVER options

duckdb_fdw accepts the following options via the CREATE SERVER command:

CREATE USER MAPPING options

There is no user or password conceptions in DuckDB, hence duckdb_fdw no need any CREATE USER MAPPING command.

In OS duckdb_fdw works as executed code with permissions of user of PostgreSQL server. Usually it is postgres OS user. For interacting with DuckDB database without access errors ensure this user have permissions on DuckDB file and, sometimes, directory of the file.

CREATE FOREIGN TABLE options

duckdb_fdw accepts the following table-level options via the CREATE FOREIGN TABLE command:

duckdb_fdw accepts the following column-level options via the CREATE FOREIGN TABLE command:

IMPORT FOREIGN SCHEMA options

duckdb_fdw supports IMPORT FOREIGN SCHEMA (PostgreSQL 9.5+) and accepts no custom options for this command.

TRUNCATE support

duckdb_fdw implements the foreign data wrapper TRUNCATE API, available from PostgreSQL 14.

As SQlite does not provide a TRUNCATE command, it is simulated with a simple unqualified DELETE operation.

TRUNCATE ... CASCADE support not described.

Functions

As well as the standard duckdb_fdw_handler() and duckdb_fdw_validator() functions, duckdb_fdw provides the following user-callable utility functions:

duckdb_fdw_version
--------------------
              10000  

DuckDB_execute

FUNCTION duckdb_execute(server name, stmt text) RETURNS void

This function can be used to execute arbitrary SQL statements on the remote DuckDB server. That will only work with statements that do not return results (typically DDL statements).

Be careful when using this function, since it might disturb the transaction management of duckdb_fdw. Remember that running a DDL statement in DuckDB will issue an implicit COMMIT. You are best advised to use this function outside multi-statement transactions.

It is very useful to use command that duckdb_fdw does not support, for example,

SELECT duckdb_execute('duckdb_server'
,'create or replace view iris_parquet  as select * from parquet_scan(''temp/iris.parquet'');');

create foreign TABLE duckdb.iris_parquet(
"Sepal.Length" float,  
"Sepal.Width" float,
"Petal.Length" float,
"Petal.Width" float,  
"Species" text)
      SERVER duckdb_server OPTIONS (table 'iris_parquet');

-- or an easy way

IMPORT FOREIGN SCHEMA public limit to (iris_parquet) FROM SERVER  
duckdb_server INTO duckdb;
SELECT duckdb_execute('duckdb_server'
,'CREATE TABLE test (a INTEGER, b INTEGER, c VARCHAR(10));
');
SELECT duckdb_execute('duckdb_server'  
,'COPY test FROM ''/tmp/test.csv'';');

Identifier case handling

PostgreSQL folds identifiers to lower case by default. DuckDB behaviour not described. It's important to be aware of potential issues with table and column names.

Generated columns

DuckDB provides support for generated columns. Behaviour of duckdb_fdw with these columns isn't yet described.

Note that while duckdb_fdw will INSERT or UPDATE the generated column value in DuckDB, there is nothing to stop the value being modified within DuckDB, and hence no guarantee that in subsequent SELECT operations the column will still contain the expected generated value. This limitation also applies to postgres_fdw.

For more details on generated columns see:

Character set handling

Yet not described

Examples

Install the extension

Once for a database you need, as PostgreSQL superuser.

CREATE EXTENSION duckdb_fdw;

Create a foreign server with appropriate configuration:

Once for a foreign datasource you need, as PostgreSQL superuser. Please specify DuckDB database path using database option.

CREATE SERVER duckdb_server
FOREIGN DATA WRAPPER duckdb_fdw
OPTIONS (
    database '/path/to/database'
);

Grant usage on foreign server to normal user in PostgreSQL:

Once for a normal user (non-superuser) in PostgreSQL, as PostgreSQL superuser. It is a good idea to use a superuser only where really necessary, so let's allow a normal user to use the foreign server (this is not required for the example to work, but it's secirity recomedation).

GRANT USAGE ON FOREIGN SERVER duckdb_server TO pguser;

Where pguser is a sample user for works with foreign server (and foreign tables).

User mapping

There is no user or password conceptions in DuckDB, hence duckdb_fdw no need any CREATE USER MAPPING command. About access problems see in CREATE USER MAPPING options.

Create foreign table

All CREATE FOREIGN TABLE SQL commands can be executed as a normal PostgreSQL user if there were correct GRANT USAGE ON FOREIGN SERVER. No need PostgreSQL supersuer for secirity reasons but also works with PostgreSQL supersuer.

Please specify table option if DuckDB table name is different from foreign table name.

    CREATE FOREIGN TABLE t1 (
      a integer,
      b text
    )
    SERVER duckdb_server
    OPTIONS (
      table 't1_duckdb'
    );

If you want to update tables, please add OPTIONS (key 'true') to a primary key or unique key like the following:

    CREATE FOREIGN TABLE t1(
      a integer OPTIONS (key 'true'),
      b text
    )
    SERVER duckdb_server 
    OPTIONS (
      table 't1_duckdb'
    );

If you need to convert INT DuckDB column (epoch Unix Time) to be treated/visualized as TIMESTAMP in PostgreSQL, please add OPTIONS (column_type 'INT') when defining FOREIGN table at PostgreSQL like the following:

    CREATE FOREIGN TABLE t1(
      a integer,
      b text,
      c timestamp without time zone OPTIONS (column_type 'INT')
    )
    SERVER duckdb_server
    OPTIONS (
      table 't1_duckdb'
    );

As above, but with aliased column names:

    CREATE FOREIGN TABLE t1(
      a integer,
      b text OPTIONS (column_name 'test_id'),
      c timestamp without time zone OPTIONS (column_type 'INT', column_name 'unixtime')
    )
    SERVER duckdb_server
    OPTIONS (
      table 't1_duckdb'
    );

Import a DuckDB database as schema to PostgreSQL:

    IMPORT FOREIGN SCHEMA someschema
    FROM SERVER duckdb_server
    INTO public;

Note: someschema has no particular meaning and can be set to an arbitrary value.

Access foreign table

For the table from previous examples

    SELECT * FROM t1;

Limitations

Tests

All tests are based on make check, main testing script see in test.sh file. We don't profess a specific environment. You can use any POSIX-compliant system. Testing scripts from PosgreSQL-side is multi-versioned. Hence, you need install PostgreSQL packages in versions listed in sql directory. PostgreSQL server locale for messages in tests must be english. About base testing mechanism see in PostgreSQL documentation.

Testing directory have structure as following:

+---sql
    +---11.7
    |       filename1.sql
    |       filename2.sql
    | 
    +---12.12
    |       filename1.sql
    |       filename2.sql
    | 
.................  
    \---15.0
           filename1.sql
           filename2.sql

The test cases for each version are based on the test of corresponding version of PostgreSQL. You can execute test by test.sh directly. The version of PostgreSQL is detected automatically by $(VERSION) variable in Makefile.

Contributing

Opening issues and pull requests on GitHub are welcome.

You don't need to squash small commits to one big in pull requests.

For pull request, please make sure these items below for testing:

Useful links

Source

Reference FDW realisation, postgres_fdw

General FDW Documentation

Other FDWs

Special thanks

Authors of https://github.com/pgspider/sqlite_fdw

License

MIT License