asg017 / sqlite-loadable-rs

A framework for writing fast and performant SQLite extensions in Rust
Apache License 2.0
346 stars 17 forks source link

Considerations for your API #8

Open CGamesPlay opened 1 year ago

CGamesPlay commented 1 year ago

Hello, I saw your original publication on Hacker News and commented there. It seems like you continue working on this project, which is great, but I have spent a fair amount of effort also thinking about this problem and I can tell you about some of the problems that I encountered in my experience. I am offering you some unsolicited code review for your project. If this isn't welcome, feel free to stop reading here and close this issue 🙂

I'll be referencing sqlite3_ext a few times, for comparison, which I released after seeing your crate but I don't consider it ready yet because many APIs aren't supported. The license of my library is compatible with yours, so feel free to adopt anything you like.

General

Virtual tables

Application-defined functions

Conclusion

Overall I'm happy to see that Rust support for SQLite loadable extensions is a wanted concept. The primary goal of my API was to be the most ergonomic way of creating SQLite extensions (completely eliminate the use of unsafe code in library consumers, configurable SQLite version targeting, etc.). I don't think there necessarily need to exist multiple versions of a Rust crate for creating SQLite loadable extensions, but I definitely respect if you have different goals for your project.

Happy to answer any questions about why I made some of the API considerations that I did, and again, feel free to lift anything you like from my library and modify it as you wish. Cheers!

asg017 commented 1 year ago

Hey @CGamesPlay , thanks for the detailed thread! It's really cool to see other people excited about the internals of SQLite and how to make a nice Rust API.

Here are some of my thoughts:

General

What versions of SQLite are supported? If you load the extension and use features that aren't supported, does the library author have any way of detecting and handling this? I notice your ErrorKind doesn't have anything about versions. Given that library consumers are creating runtime-loadable extensions, I think this will be an important consideration for them.

All good points! I should guard some of the newer features of this API depending on the loaded SQLite version. For now I've just assumed people are running with a very recent version of SQLite (3.30 or above), but I should be more careful here

You haven't done any abstraction of sqlite3_value yet, looks like. I offer you sqlite3_ext::ValueRef, a safe Rust wrapper around sqlite3_value that supports all of the latest SQLite features for them. Please take special note of PassedRef, which is an extremely useful construct called the pointer passing interface. There's also UnsafePtr which is supported more widely on older versions of SQLite.

This was done on purpose - I did try a few wrappers around sqlite3_value, but it came with a significant perf hit. I had some benchmarks on the sqlite-regex project, and any wrappers I tried around sqlite3_value came with 1.5-3x slow down in some functions. I don't particularly like having * mut sqlite3_value in most public interfaces, but for this lib, I wanted to put perf first.

Also, re pointer passing interface, I do have different support of that, with api::value_pointer and api::result_pointer. Not sure if I implemented it in a safe way, but it seems to work as expected.

It's possible for library consumers to create statically linked extensions for both Rust and C programs, as well as loadable ones. Statically loaded extensions are conceptually easier since they target a fixed version of SQLite and you can make some guarantees about compatibility (enforced at link time). Take a look at the link configurations I outline in my README, to see some configurations I've thought about and how I've enabled them.

This library works with statically linked extensions - if you compile with cargo build --crate-type=staticlib you'll get an .a file, and cbindgen will create valid .h files, and then can be statically linked during compilation. I don't have an example in this repo yet, but will add one soon!

You will need to create a querying interface; for example you can't implement the FTS5 interface without using SQLite for storing the index data. You can probably lift mine wholesale if you like, it's basically rusqlite's with some of the rough edges polished down (since I wasn't constrained by a preexisting API).

Totally agree! Already have a few projects that need this. Will take a look at your implementation, thanks for sharing!

Virtual tables

Your collection of VTab configurations appears arbitrarily limited. There's no reason that an eponymous-only module ("table function") can't support transactions. You have no way to add support for find_function except through your define_virtual_table_writeablex function which seems to indicate that you're already unhappy with the way the API is going. Take a look at how I handled this problem; it may be worthwhile to switch to an API closer to this one. It supports all configurations that SQLite supports and I think it's pretty ergonomic.

You're totally right, define_virtual_table_writeablex was when I realized "this is not good". I really like the way yours is designed, and will give it a shot!

The aux data for VTabs doesn't need to be Option, if I want an optional field I'll make VTab::Aux optional directly. Application-defined functions

Makes sense, will update!

You haven't wrapped sqlite3_context yet. Take a look at my wrapper for something you may be able to lift.

Same as sqlite3_value explanation from above - found perf issues with wrappers, but will give it another shot.

You haven't added support for aggregate functions yet. Take a look at my create_aggregate_function for something you may be able to lift. In particular, note that the AggregateFunction trait works as both a "legacy aggregate function" and a window-capable one, and this distinction can be supported with automatic fallback depending on the runtime version of SQLite.

Very cool! Will track this in #1

don't think there necessarily need to exist multiple versions of a Rust crate for creating SQLite loadable extensions, but I definitely respect if you have different goals for your project.

Agreed, but I think this project does have a few different goals. I do want to keep things as safe as possible and have the API be ergonomic. But I really care about performance and keeping things as fast as possible, as evident by leaving *mut sqlite3_value/* mut sqlite3_context in public APIs. I also keep the API as similar to the C API as possible, but that's mostly because I'm more familiar with the C API more than I am with modern ergonomic rust. So, for some danger of possible risky APIs and awkward not-ergonomic Rust, sqlite-loadable will hopefully be a fast/performant way to write SQLite extensions in Rust, which is where it breaks with sqlite3_ext.

Again, thanks for sharing your thoughts, happy to discuss more!

CGamesPlay commented 1 year ago

Great response! Sad to hear your results about wrapping values/contexts. I haven't done any benchmarking on my library yet, as mentioned I'm more focused on ergonomics than performance (obviously performance is important, but I will have to figure out a way to make a performant wrapper).

I would actually be surprised if the wrappers that I use cause a performance hit. They take up no additional space (both repr(transparent)) and the methods are almost all one-line wrappers around SQLite methods that I would assume are inlined in production. If you dive back into this, consider taking a look at my impl FromValue for ValueRef and trait ToContextResult, perhaps I approached it from a different direction than you.

Also, re pointer passing interface, I do have different support of that, with api::value_pointer and api::result_pointer. Not sure if I implemented it in a safe way, but it seems to work as expected.

Comparing with my implementation: I use Rust's TypeId to automatically create a safe pointer tag. This bypasses the need to define the string tag yourself, and is pretty close to free at runtime. It requires that T: 'static, which I think is something that you actually want to require as well (what happens if my result_pointer has a reference inside of it?). It requires that a foreign-language receiver of the pointer has to do extra work, but I don't think this matters because there is no sqlite3_column_pointer method anyways.

When I get back into improving the extension, instead of writing the actual extension I want, I'll try to run some benchmarks between the two and see where the gaps lie.