fjall-rs / fjall

🗻 LSM-based embeddable key-value storage engine written in safe Rust
https://fjall-rs.github.io/
Apache License 2.0
601 stars 25 forks source link

Better compat with bytes::Bytes and/or bytes::Buf #94

Open carlsverre opened 1 day ago

carlsverre commented 1 day ago

Is your feature request related to a problem? Please describe. I'm using Fjall in a project that makes heavy use of the bytes crate. Currently this requires that I copy data when moving to/from the fjall::Slice type.

Describe the solution you'd like The best solution would be to have Fjall natively support the Bytes type in place of Slice. It appears that they both are effectively based on Arc<[u8]> although Bytes offers a lot more functionality related to zero-copy slices and so on.

In the event the Bytes type can't supercede Slice (which is totally fair), better interop would be fantastic. At the minimum, implementing the Buf trait for Slice would unlock a lot more interop with libraries like Prost.

Describe alternatives you've considered I'm currently copying data to/from the Slice type.

Additional context Happy to provide more information as needed! Also feel free to ping me on Discord, I'm in the fjall server under the username f0a.

carlsverre commented 1 day ago

Ok, one part of this issue is completely invalid: no need to implement Buf for Slice. It would be slightly nicer, but I can just use as_ref since Buf is implemented on &[u8].

So, really this issue is more about going from Slice to Bytes without making copies.

marvin-j97 commented 1 day ago

Hmm, that would and could be implemented in value-log as a feature flag and ultimately as feature flags in lsm-tree and fjall. I think supporting bytes makes sense because it's such a well used crate, but Slice would probably need an into_inner method (if bytes is enabled. But the default should be an Arc as it is now, and in the future it will probably be replaced with byteview.

Though, maybe instead the with_vtable in bytes can be used instead, to build a Bytes without copy?

carlsverre commented 20 hours ago

Replacing Slice with byteview makes a lot of sense in the general case. But in my case I would love a feature flag that replaces the Arc<[u8]> in Slice with a Bytes object along with an into_inner to get it out. Anything in the code that depends on Slice being backed by an Arc or would this be a fairly mechanical transition? I could take a crack at it if you think it wouldn't be too disruptive.

with_vtable looks very intriguing, although also very unsafe/private. Seems like supporting byteview directly in the bytes crate would be very doable though.

marvin-j97 commented 20 hours ago

Replacing Slice with byteview makes a lot of sense in the general case

Slice as a general type will obviously stay around, it's just its inner wrapped type that will get swapped out in the future. So it's completely opaque to the user.

Anything in the code that depends on Slice being backed by an Arc or would this be a fairly mechanical transition? I could take a crack at it if you think it wouldn't be too disruptive.

I think it should be fairly straightforward, as nothing relies on the inner type being an Arc specifically. Though it is important that Slice (even with bytes) has a way to mimic Arc::get_mut, which it should by using BytesMut and converting back using freeze. Because this in-place mutation will be used in the future to skip some unnecessary heap allocations (it's not needed just yet).

The with_vtable stuff can probably still be used down the line, but it's probably not too important.

carlsverre commented 20 hours ago

Gotcha! Thanks for the notes! Will take a crack at this.

marvin-j97 commented 10 hours ago

Hmm it would probably be better for into_inner to be Into<Bytes> (and/or into_bytes) instead, and guard that using the feature flag. I don't know if I want to leak the inner type in the non-bytes case.

marvin-j97 commented 3 hours ago

I added the aforementioned Slice methods for construction in value-log 1.2.0:

https://github.com/fjall-rs/value-log/blob/a1c05b4fca355870058962b0dd7856144dcae284/src/slice.rs#L19-L33

With bytes it should be possible to construct it using BytesMut::with_capacity, that skips the Vec (re)allocation, that can be filled from the Reader - then convert into Bytes (and ultimately a Slice) using freeze.