Mooncake-Labs / pg_mooncake

Iceberg/Delta Columnstore Table in Postgres
http://mooncake.dev
MIT License
236 stars 12 forks source link

Use a shared libduckdb rather than embed it #7

Open Vonng opened 3 weeks ago

Vonng commented 3 weeks ago

There's a significant challenges with the current approach of embedding libduckdb directly.

  1. Compilation Time and Package Size: Embedding libduckdb requires a substantial amount of compilation time and results in a dramatic increase in the size of the package. This problem is exacerbated when considering combinition for 3 PostgreSQL major version and 5 OS distribution.

  2. Conflict with pg_duckdb: The method of embedding libduckdb conflicts with pg_duckdb, forcing users to choose between one or the other. This restriction adds unnecessary adoption difficulties.

To address these concerns, I propose the adoption of a shared libduckdb, similar to the approach used in duckdb_fdwavailable at this Commit.

The DuckDB official release provides a binary libduckdb.so, and I have already created RPM/DEB packages for EL8/EL9, Ubuntu 22.04/24.04, and Debian 12, which are readily available at [ext.pigsty.io/#/](https://ext.pigsty.io).

Adopting a shared libduckdb would mitigate the issues related to compilation times, package sizes, and software conflicts, ultimately simplifying maintenance and user choice.

dpxcc commented 3 weeks ago

Thanks for looking into the build process and for distributing pg_mooncake with Pigsty! We appreciate the suggestion and have indeed considered using a shared libduckdb.so, but a few challenges come up with this approach:

  1. Dependency on DuckDB's Internal API: Unlike duckdb_fdw, pg_mooncake depends on DuckDB's internal C++ API, which isn't guaranteed to be stable and can change between DuckDB releases. This means pg_mooncake may not be compatible with arbitrary versions of libduckdb.so, potentially causing compatibility issues if users install different versions of DuckDB.

  2. Planned Use of Non-Builtin DuckDB Extensions: We're looking to leverage non-builtin DuckDB extensions like delta and iceberg to support reading external tables from third-party catalogs. This will require a way to ensure these extensions are consistently available in every installation, which may be harder to manage with a shared libduckdb.so.

We appreciate the work you've done on RPM/DEB packages and are interested in collaborating in improving the build and distribution process while ensuring compatibility and stability.