Consider using Bytes/Str instead of BytesMut/StrMut in Frame struct

rukai commented 2 years ago

I'm not super familiar with where its best to use Bytes vs BytesMut but my thinking goes like: Once parsed it would be really rare for someone to want to directly modify the value in a Frame. Instead its much more likely to convert to a new type entirely or possibly to combine into a seperate BytesMut. And if Frame uses Bytes then its a lot easier to construct a Frame as we can use a Bytes type or a BytesMut type (freeze is free) But when Frame uses BytesMut it would require a new allocation to go from Bytes to BytesMut. Another advantage of Bytes is being able to construct Frames from Bytes::from_static(b"OK") which performs no allocation.

The implementation of the parser could be the same but just call freeze https://docs.rs/bytes/latest/bytes/struct.BytesMut.html#method.freeze

In fact I wonder if BytesMut is needed at all in the decoder if its just splitting up the bytes of the message which Bytes can do just fine. Possibly this is just due to the nom API? The version of nom used is out of date so this is possibly fixed in a later version?

aembke commented 2 years ago

Oh yeah @rukai thanks for checking out the PR. Next time I push some big changes like that I'll tag you in the PR.

I initially chose BytesMut/StrMut because there were a few places in fred where I figured it could be useful to share and mutate the buffers directly, but after thinking about it a bit more I think it'd be better to do that outside the Frame interface.

I don't think there's anything in nom preventing the use of Bytes here - IIRC the trait interface only borrows self on each function.

I'll have to think about how to release that though. 4.0.0 has only been up for a day and I doubt anybody is using it yet, so I might just yank 4.0.0 and re-release this with Bytes.

rukai commented 2 years ago

yanking is fine by me given the circumstances.

aembke commented 2 years ago

I just noticed the link to shotover-proxy and that looks like an interesting project. For the purposes of Redis it looks like it does some of what the AWS Elasticache load balancers do too (rewriting IP addresses, etc), which is really nice. The main goal with 4.0.0 was to remove allocations, so hopefully it helps performance there. I haven't yet adopted it in fred so I'm not sure if or how much it helps, but hopefully I'll have some data on that in the next week or so.

Out of curiosity, would you find it useful if I released the NomBytes wrapper as a standalone module? I was surprised when I went to use Bytes with nom that nobody had implemented the nom parsing traits for Bytes or BytesMut, but maybe I just missed it. I haven't looked to see if you use nom there, but I was debating releasing that on its own.

rukai commented 2 years ago

We currently dont depend on nom directly, so far we've been able to make use of existing protocol crates for everything we've needed. We might need something like that in the future but I can raise an issue at that time if we find NomBytes would be useful for us.

I havent looked into the implementation much at all but I did see this comment:

https://github.com/aembke/redis-protocol.rs/blob/9416db942e1988a27309ca5b37c6f4bc77eddc66/src/nom_bytes.rs#L32+#L33

This works because the relatively cheap
`Clone` impl on `BytesMut` allows for removing any lifetimes and the issues that come with using those for this kind of use case.

Which is concerning because im pretty sure BytesMut::clone does allocate. Maybe you got the properties of BytesMut and Bytes mixed up. Bytes does not allocate on clone.

I think generally the idea is to write your decoder with Bytes as this does not allocate. For the encoder you could use BytesMut to collect the output but I think a plain old Vec might work just as well.

aembke commented 2 years ago

Yeah you're right about the comments there. When I initially wrote this I did it with Bytes/Str, and then scaffolded out the docs based on that, but then later switched over to BytesMut/StrMut and didn't consider some of the cascading changes in the docs, parser, etc.

One thing to consider - I don't think it's as simple as using Bytes/Str in your decoder in all situations if you're looking to avoid allocations. This is partially due to how nom works, and partially due to how the tokio-utils codec interface works (which is the main interface I've been targeting since that's what I use in fred).

For example, when using that codec interface the caller is given a buf: &mut BytesMut to operate on when decoding. This buffer can contain incomplete frames, or more problematic, a complete frame followed by an incomplete frame. Since not every part of the redis protocol is length prefixed (simple strings, etc) there's no quick way to figure out which part of the buffer to slice off. Callers are forced to implement at least some linear time scanning logic in order to find out where in the buffer to split_to.

This is problematic since that means you can't just buf.split().freeze() and operate on the whole buffer, because what do you do with the leftover data in the (now frozen) Bytes? There's no mechanism to turn that back into a BytesMut without a copy, and you can't unsplit a BytesMut with a Bytes. However, that data needs to get back onto the front of the original buf (ideally without allocating) in order for the codec to work correctly the next time it's used.

There's a couple ways to get around this - manual ref counting BytesMut to avoid copies or parsing twice (once to find the length to split, and again to actually parse into Frames). I would like to avoid both, so I'm going to re-implement this with a different approach.

I can see how to maybe make this work with owning-ref by getting clever with a combination of BytesMut, &[u8] and some pointers, but that strikes me as probably overly complicated. The real issue with this kind of interface in the context of RESP is that you don't know up front (or easily) how much of the buffer to split and freeze until the parser runs. In an ideal world RESP would be length prefixed like AMQP where you just have one byte length to consider, but unfortunately that's not how it works.

I'm going to spend a few days and implement this with a new frame type that tracks offsets into the buffer for each type, and then at the end I'll provide a mechanism for callers to "deref" these pointers (plus the BytesMut) into a nested Frame structure containing frozen Bytes. This should avoid any allocations while parsing. Once the parser recognizes a frame it'll be easy to split_to(x).freeze(), and then I can return frames with inner Bytes/Str that point into that frozen buffer without any allocations.

In the end this will likely result in 3 interfaces for decoding:

An interface that takes &mut BytesMut and operates without allocating using the method described above. I'll use this in fred.
An interface that takes &Bytes and operates how you described. This is probably what callers will use if they're not using the tokio-utils codec interface.
An interface that does allocate but is very easy to read and use (pretty much what versions <=3 do).

rukai commented 2 years ago

Ah that is a problem! Thanks for the writeup.

rukai commented 2 years ago

Ah, sorry I didnt get a chance to take a proper look at the PR. At a quick glance it looks good to me.

Thanks for resolving the issue so quickly!

aembke commented 2 years ago

No problem. I didn't realize until very recently that you were also using a &mut BytesMut interface in shotover-proxy, so hopefully this helps improve performance there.

With the new interface you should (in theory) be able to pass the third element in the returned tuple (the Bytes) from decode_mut straight to the outgoing proxied connection (assuming you're not modifying the payload). That could be a nice way to avoid having to re-encode frames altogether in some cases.

aembke / redis-protocol.rs

Consider using Bytes/Str instead of BytesMut/StrMut in Frame struct #18