cberner / redb

An embedded key-value database in pure Rust
https://www.redb.org
Apache License 2.0
3.07k stars 137 forks source link

PoC: easy deriving Key&Value via bincode or raw Value/Key newtypes #783

Closed dpc closed 3 months ago

dpc commented 4 months ago

This PR is not meant to be merged as it, but only as a PoC.

One annoyance that I have with using redb is boilerplate of implementing Key & Value.

I was just playing with how to best improve it, and here is the result.

The corresponding change in my tiny downstream crate using redb is here: https://github.com/rustshop/htmx-sorta/commit/11fed565aea52ef7cf397df5b7cc8ed1e3af3891

There are two cases I' trying to cover:

I started with the second case, using bincode (but the approach works for any serde encoding crate). The AsBincode approach can be done entirely downstream, but I think it would be handy if redb could just support it out of the box as an optional feature.

The first case, can't be done exactly like here downstream due to generic coherenece limitations. Downstream, one would need to introduce an extra newtype, which makes using it a bit more clunky.

The naming as is is provisional, and seems a bit inconsistent.

The trait Name is regrettable as it overlaps with Value::type_name. If trait Value was already a trait Value : Name, it would not be necessary. That would be a breaking change, which is not a great thing right afte 2.0.0 is released. :laughing: . If there's ever a 3.0.0, it could be done. However, it's not too bad like this, and it will only affect people that would want to use AsRaw (which BTW. also seems like meh name to me).

Off-topic on Key

BTW. When looking at it, I had some thoughts about Key. I understand that for performance reasons the comparison is one on raw bytes, to avoid any overhead of conversions. But that makes trait Key and Key::compare very... odd. Because it's not really property of the type ... but kind of ... outside of it. Nothing in trait Key actually uses the the T that implements Key...

I think it's a nice feature, that I already manage to use in my little project for SortId. I might be wrong, but it seems to me that typically databases don't even allow customizing the ordering.

First thought that I had is that maybe Key::compare should have a default impl, as most cases will kind of not care anyway/will expect the typical order:

pub trait Key: Value {
    fn compare(data1: &[u8], data2: &[u8]) -> Ordering {
        data1.cmp(data2)
    }
}

That would be an easy non-breaking change, and it would just save some typing.

But thinking more along the lines that it's weird, and opening possibility of changing the API, I wonder if it would be better to just get rid of trait Key, and move to:

pub trait SortOrder {
    fn compare(data1: &[u8], data2: &[u8]) -> Ordering {
        data1.cmp(data2)
    }
}

pub struct MultimapTableDefinition<'a, K: Value + 'static, V: Value + 'static, Ord : SortOrder = Lexicographical> {
  ...
}

This would move the compare a bit more out of the way, made sorting order a property of a Table, have defaults that most people expect, and retain ability to customize the ordering, without tying it weird to type of the key via Key.

cberner commented 3 months ago

I don't think I want to add additional traits for implementing Key and Value. If there's a way to make them much simpler, I'm not opposed to another major version to switch to that. I haven't been able to come up with something strictly better though.

The derive macro approach taken here seemed preferable to adding additional traits to me: https://github.com/cberner/redb/pull/667/files

dpc commented 3 months ago

The derive macro approach taken here seemed preferable to adding additional traits to me: https://github.com/cberner/redb/pull/667/files

I tried implementing it, and it's a big chunk of relatively gnarly proc-macro code that looks like a subset of serde. That's the reason original author abandoned it, and I never got to finish (and won't have time to maintain).

The code in my PRs is 40 lines for each: newtypes over raw values and bincode. The usability is only minimally worse than of derive approach, IMO. The bincode approach generalizes to any serde-encoding, so would be useful at least as an example.

BTW. the AsRaw also could be made an optional feature. Regrettably unlike bincode one, I can't use it downstream as is.

IMO the two are an usabilty win, for a small price (they don't require breaking anything). They could be eventually abandoned after something better is available.

cberner commented 3 months ago

Sorry, but one of my design goals is to keep the interface of redb very simple, so I don't want to have multiple traits for implementing types unless there's a really strong reason. I'd be happy to merge a PR with an example of how to use bincode, as long as bincode doesn't become a dependency of redb. That'd allow people to copy the code into their project, if they want to.

dpc commented 3 months ago

Sorry, but one of my design goals is to keep the interface of redb very simple, so I don't want to have multiple traits for implementing types unless there's a really strong reason.

Would you be open to either splitting redb into some redb-core + redb, where redb-core would operate on just raw byte values, while redb would have the current interface; or alternatively have some abstractions like RawTable that would effectively be K = &[u8], V = &[u8]? This would make it easier to just swap/avoid the Key/Value things details.

I'm used to rockdb, and that works on bytes. Sled also works bytes. Due to ownership model owned vs borrowed values, etc. the higher-level type-based API, will always going to be a obstacle for certain usecases. So a "simplest API" would just be raw bytes.

E.g. in our project we have an abstraction over rocksdb where we take a Database + byte prefix and make it look like it's own database, that we can pass dowstream, effectively isolating / partitioning the key space. I see no way to do it with higher-level keys.

I guess I could just go with a newtype RawTable(Table<&[u8], [u8]>) for everything and build on top of that, but seems also like something redb could support out of the box, even if just because some use cases will need it, and most dbs like redb will only that.

cberner commented 3 months ago

I'm not following how redb-core helps your use case. If you need an API that takes raw bytes, you should be able to use Table<&[u8], &[u8]> and use it just like you would rocksdb or sled. Is there something about that which doesn't work?

dpc commented 3 months ago

I'm not following how redb-core helps your use case. If you need an API that takes raw bytes, you should be able to use Table<&[u8], &[u8]> and use it just like you would rocksdb or sled. Is there something about that which doesn't work?

It works, thanks.

dpc commented 3 months ago

I'm just going to wrap the whole API (at least the parts that I actually need) over byte-tables, and bincode everything using BE configuration, which should sort right, if I haven't missed anything in the spec.

I got get, insert and remove working: https://github.com/dpc/redb-bincode Serialization can reuse thread-local buffers with const initlized Vecs, which I hope will have negligible overhead. The UX ideal, as far as I can tell, borrowed values, and everything.