duckdb / duckdb-rs

Ergonomic bindings to duckdb for Rust
MIT License
498 stars 106 forks source link

Rust extensions based on Extension C API #370

Open samansmink opened 3 months ago

samansmink commented 3 months ago

Hi 👋

I've been working on the new C API for extensions in duckdb/duckdb (see https://github.com/duckdb/duckdb/pull/12682). TLDR: we've added a header to duckdb which contains basically a struct of function pointers with most of the C API. This struct is then passed to a loadable extension on load. This is basically the same mechanism that sqlite has.

Goals

One of the goals of this API is to be able to write clean pure-rust extensions for DuckDB that can be used across multiple duckdb versions. Basically I want an extension template like https://github.com/duckdb/extension-template, but than in pure-rust based on this new C API.

Now that that PR is merged, I would like to get started on this.

Approach

Good for me, rusqlite, the repo this one is based on, is way ahead of us and already has support for sqlite's variant of the extension api struct since https://github.com/rusqlite/rusqlite/pull/1362.

So in theory, all we need to do is copy (the parts we like) of rusqlite for this to generate the loadable extensions. Then tie everything together for the

The catch

The main catch is my Rust knowledge is pretty minimal. Therefore this will also be a bit of a rust-learning-experience for me. However, given the fact that it's mostly a copy-and-paste exercise from rusqlite, this should be fine? ;)

finally, any help or advice is very welcome!

0xcaff commented 2 months ago

hey hey! glad to see there's some progress on this. I'd love to get this landed. Are you looking for someone to pair with on the rust bits? Are you looking for someone to do the rust side of the implementation? I'm happy to do either one or even hop on a call and figure out the best path forward (I can allocate 1 day of time this week towards getting this landed). What would be most useful for you?

Broadly it seems like the big parts are

I'm in US Central and available during daytime hours (9am to 9pm), let me know when a good time is for you over the next few days.

samansmink commented 2 months ago

Hey @0xcaff, That's great to hear! Lets setup a quick call I would love to get your take on this. I will drop you an email.

I have also fiddled with it and I have a (very messy) PoC working here: https://github.com/samansmink/duckdb-rs/tree/poc-rust-c-extension-api. I think it looks pretty similar to your direction, but I also added the attribute macro for the entrypoint, handled the version properly and added a demo function to the examples which uses the new entrypoint. I actually managed to load the extension from DuckDB too! :)

So basically the end-to-end is there, now its a matter of thinking about how to get things integrated cleanly. One of the things that I find a little awkard about the way things are setup in my feature branch now is that the libduckdbsys package is now basically fundamentally different based on this loadable_extension feature, making it not really a feature but almost transforming it completely into a different package.

0xcaff commented 2 months ago

Nice! It looks like my changes are pretty similar to yours. I think the open questions from me are

As for replacing all implementations of functions in the package, I don't think its a big deal as if you want both versions to co-exist you could import the crate twice with different features and also because all the stuff in the -sys crate is only wrappers around the C ABI. We can talk more about this on Friday.

0xcaff commented 2 months ago

Notes from our call

Thanks for taking the time to bring me up to speed Sam!

I'll take on getting the bundled build working.

parkerdgabel commented 2 months ago

Hi! I'm looking to start an extension in rust using the c API. I'm available to help with anything to move this along!

0xcaff commented 2 months ago

@samansmink I made a PR into your repo with bundled build working. Tagging it here in case it got lost https://github.com/samansmink/duckdb-rs/pull/1

Can't wait to get this all landed when you're back from holiday!

samansmink commented 2 months ago

that's great, @parkerdgabel! I will ping you for a review once the PR lands, some more eyes on this is definitely helpful

0xcaff commented 2 months ago

Saw you merged the work in progress PR into your branch! Let me know if you are finding issues with it, happy to help.

parkerdgabel commented 2 months ago

Great @samansmink ! I have the beginning of my extension here if you would like a repo to test on. I'm not totally sure how to compile for consumption by duckdb. I was working off your poc branch. I’d love to help get this extension template working

0xcaff commented 2 months ago

I've setup a usage of the c extensions API in my project (vendored duckdb-rs) and I found a few things we missed.

  1. https://github.com/0xcaff/duckdb_protobuf/blob/1a2bb08d445daa5d2400301351a7c89cc8c37fe0/patches/libduckdb-sys%2B1.0.0.patch#L102-L114 If buildtime_bindgen is disabled, and loadable_extension, we gotta switch here. We missed this in both our POCs. Only discovered when disabling bundling.
  2. I added handling for the minimum_duckdb_version value here https://github.com/0xcaff/duckdb_protobuf/blob/1a2bb08d445daa5d2400301351a7c89cc8c37fe0/patches/duckdb-loadable-macros%2B0.1.2.patch#L53-L58 https://github.com/0xcaff/duckdb_protobuf/blob/1a2bb08d445daa5d2400301351a7c89cc8c37fe0/patches/duckdb-loadable-macros%2B0.1.2.patch#L78C71-L78C101

There's still a bunch of rough edges.

What happens to old extensions following the ABI of the new extensions? It seems like they are all going to break? This seems suboptimal, is this really what you want?

D LOAD '/Users/martin/projects/duckdb_protobuf/target/release/protobuf.duckdb_extension';
Not implemented Error: Enum value: '' not implemented

It is because of the code here trying to check the abi

https://github.com/duckdb/duckdb/blob/0bc27aba2cfd2fb3dd3637b4e4d28f2d6c34db74/src/main/extension/extension_load.cpp#L197

0xcaff commented 2 months ago

here's what i did to get stuff working in my project btw https://github.com/0xcaff/duckdb_protobuf/pull/17

carlopi commented 1 month ago

A bit lateral, but the mentioned problem where error message would be cryptic on extension mismatch have been improved in the main of duckdb, to become v1.1.1 by https://github.com/duckdb/duckdb/pull/13894.

parkerdgabel commented 1 month ago

Btw I forked the project and started my extension. It is working for me. I also added Scalar function support to the library. Here is my extension using the fork https://github.com/parkerdgabel/quackML Here is a short video of logistic regression now running in pure rust through duckdb. https://github.com/user-attachments/assets/ccc1e33a-0ae6-4c0b-9d84-38913754991d

I've continued to work on the fork in the meantime but can still help with this.

As an aside, do any of you know how I can write a Vec to a flatbuffer of list type?