duckdb / community-extensions

https://community-extensions.duckdb.org
102 stars 19 forks source link

Guidance on developing DuckDB extensions in Rust #54

Open t-kalinowski opened 1 month ago

t-kalinowski commented 1 month ago

Hello DuckDB Team!

I am exploring the possibility of writing an extension for DuckDB and am particularly interested in developing primarily in Rust. I anticipate I'll mostly be using the C API crate libduckdb_sys, but I am unsure if there are existing examples or templates in Rust that I could refer to.

Since I haven't come across any community extensions written in Rust, I wanted to inquire whether you are aware of any, or if there are any plans to support such developments. Any guidance on how to get started, as well as any relevant documentation, would be immensely helpful.

Thank you for your assistance!

carlopi commented 1 month ago

There are at the moment a few rust based community extensions:

And a rust-based core duckdb extension: https://github.com/duckdb/duckdb_delta

They use different approaches (@rustyconover's one, @ywelsch's one and @samansmink's in wrapping delta-rs), I would recommend to have a look and see what would fit best with your constraints, and possibly clone one and start playing with that.

Also pinging the authors since they might have something to add. We should improve docs on how to get started, this would also be handy to have.

Also on DuckDB's discord channels about extensions or rust there are helpful conversations around this.

samansmink commented 1 month ago

@t-kalinowski There's also ongoing work for a new extension API based on the C API This will allow writing pure Rust extensions. For now, the extensions linked by @carlopi demonstrate the way to go.

t-kalinowski commented 1 month ago

Thanks for the links!

Please correct me if I've misunderstood, but it appears the linked extensions still contain a significant amount of C++ code. This code seems to require an in-depth understanding of the undocumented, and potentially internal, aspects of the DuckDB C++ API.

Seeing the PR for extensions that only use the C API is exciting! Do you think we could start a "community-extension-rust" template repo soon, where the example "quack" function is written in Rust, using primarily the DuckDB C API via duckdb::ffi?

What do you think?

samansmink commented 1 month ago

Please correct me if I've misunderstood

That is completely correct.

Do you think we could start a "community-extension-rust" template repo soon

While I can't give any promises on soon, I can say that this is certainly pretty high on the priority list. It's one of the main goals of introducing the C API for extensions.

0xcaff commented 1 month ago

I'm developing an external (non-C++) DuckDB plugin, which you can find at https://github.com/0xcaff/duckdb_protobuf. It appears I might be creating the first all-Rust plugin, as I've encountered several issues along the way. The duckdb-rs bindings lack local initialization for vtables, and the build tooling requires a custom metadata writer (I've created my own at https://github.com/0xcaff/duckdb_protobuf/blob/master/packages/duckdb_metadata/src/lib.rs). Additionally, the bundled feature doesn't correctly pin versions, and integration with the community repository is problematic.

I'd appreciate a way to publish to the community repository without having to adopt DuckDB's build tooling. There are numerous ways to build a DuckDB extension using the C ABI (for example, see the ongoing work on the Zig plugin SDK). It would be beneficial to have a method for out-of-tree builds.

This is particularly important because data formats evolve slowly, and the current process of bridging Rust to C++ to DuckDB involves many steps before deriving value from the integration. Simplifying this process could make it easier for folks to adopt DuckDB.

samansmink commented 1 month ago

Hey @0xcaff!

You make an interesting point. We decided to to go with our current approach where the build tooling is fixed. This has some clear advantages:

Our plan is to add the aforementioned C API allowing more flexibility build-tooling wise. The main advantage of going that route is that it will allow for an easy, standardised way to write DuckDB extensions in whatever language supports calling C code easily. This way we keep the maintenance sane across extension build tooling in the various languages.

My main question is, would the aforementioned C API and corresponding (to be developed) build tooling solve your problems with the current setup?

0xcaff commented 1 month ago

Thanks for sharing some of the why behind this design, it makes a lot more sense now. When you said C API, I thought you were speaking about the existing C API https://duckdb.org/docs/api/c/api.html I see where the build tooling complexity comes from, I was not aware of the intricacies of dynamic linking across platforms. It seems the new C API will basically move the linking of external functions into userland, making it much easier to build and link (no more need for dynamic linker). I think this solves for my use case, can't wait to take it for a spin once its ready!