SoftbearStudios / bitcode

A binary encoder/decoder for Rust
https://crates.io/crates/bitcode/
MIT License
376 stars 19 forks source link

Streaming API #6

Open LevitatingBusinessMan opened 1 year ago

LevitatingBusinessMan commented 1 year ago

Are there any plans for a streaming API? The ability to serialize/deserialize impl Read and impl Write.

I want to be able to deserialize from a TCPStream.

bincode and postcard both support this.

caibear commented 1 year ago

Are there any plans for a streaming API? The ability to serialize/deserialize impl Read and impl Write.

For our use case we don't need this feature so I am hesitant to add and maintain it.

From an API perspective it would involve duplicating encode into encode_into(w: &mut impl Write, t: &impl Encode) -> Result<(), Error> and decode into decode_from<T: Decode>(r: &mut impl Read) -> Result<T, Error>.

From an internal code perspective it would need to avoid regressing the current performance without duplicating too much code.

I want to be able to deserialize from a TCPStream.

Can you read from the TCPStream into a Vec<u8> and pass that to bitcode?

Are your messages too large that they would consume too much memory? I kind of doubt this because serialized bitcode typically consumes less memory than the deserialized type does.

LevitatingBusinessMan commented 1 year ago

Can you read from the TCPStream into a Vec and pass that to bitcode?

Yes but that vector could include multiple structs. Postcard has a method take_from_bytes which returns the slice of unused bytes.

From an internal code perspective it would need to avoid regressing the current performance without duplicating too much code.

I was worried that streaming wasn't possible because bitcode relied on knowing where the serialized data ends. In hindsight I realize that doesn't make sense.

finnbear commented 1 year ago

that vector could include multiple structs

We work exclusively with WebSockets which provide their own framing of messages. The easiest way to use bitcode on a raw TcpStream might be to transmit the length (e.g. a 4 byte unsigned integer in network endian) and then the bytes from bitcode.

LevitatingBusinessMan commented 1 year ago

that vector could include multiple structs

We work exclusively with WebSockets which provide their own framing of messages. The easiest way to use bitcode on a raw TcpStream might be to transmit the length (e.g. a 4 byte unsigned integer in network endian) and then the bytes from bitcode.

Yes, that's a good solution thanks. But first I might take a crack at modifying the bitcode codebase to allow for reading a slice partially or reading from a stream.

NiseVoid commented 1 year ago

I think having a way to pack multiple types into one big packet is quite an essential feature, when encoding I guess we can just .extend_from_slice() on the slice from Buffer::encode. But there seems to currently be no way to read multiple messages packed together without including the length of each message (which afaict would be redundant information).

In my usecase (sending game data in UDP packets) I currently use a Cursor and decode messages (using bincode) in a loop until it consumed the entire packet, but even something as simple as getting (T, usize) as a return value, where usize is the number of bytes that were decoded, would be enough.

caibear commented 1 year ago

but even something as simple as getting (T, usize) as a return value, where usize is the number of bytes that were decoded, would be enough.

This would still result in redundant information since each message would be padded to the nearest byte.

LevitatingBusinessMan commented 1 year ago

Well that's a bummer

finnbear commented 1 year ago

This would still result in redundant information since each message would be padded to the nearest byte.

It's necessary for TCP streams which don't support transmitting fractional bytes, unless the end of each message waited until the start of the next message.

Well that's a bummer

To be clear, a streaming API in the sense of impl Read + Write is not planned due to performance and compatibility issues.

We're considering an API that allows you to:

This would slightly reduce the overhead of using bitcode in a stream-like context.

Edit: Closing this issue may have been premature. I've reopened it until there is an issue more focused on what we can actually implement.

LevitatingBusinessMan commented 1 year ago

decode the prefix of received data as a message and know where the decoder left off

❤️

caibear commented 9 months ago

New version of bitcode https://github.com/SoftbearStudios/bitcode/pull/19 has the potential to add streaming without high overhead.

MOZGIII commented 6 months ago

Looks like the code actually would work great with streaming APIs, if only the codec mod was publicly available. Or, rather, the View and Decoder traits for decoding.

I'd have a loop with roughly this:

  1. Try T::populate(1) on the buffer;
  2. Fails? Read and buffer more data;
  3. Works? Decode and return the value, advance the buffer to match what T::populate did to the slice we gave it.

Note that this way bitcode crate does not do any IO itself - external code would be responsible for that. This is the way I'd recommend doing it, as async exists and there everyone has their own traits for read/write ops.

caibear commented 6 months ago

I'd have a loop with roughly this:

  1. Try T::populate(1) on the buffer;
  2. Fails? Read and buffer more data;
  3. Works? Decode and return the value, advance the buffer to match what T::populate did to the slice we gave it.

You could achieve the same effect by length prefixing your messages. If your messages are long, this shouldn't add much overhead. If your messages are short and you encode them one at a time, bitcode won't provide any benefit over bincode.

Note: If your use-case is packing multiple small messages into a UDP packet see bitcode_packet_packer. It's able to encode multiple messages at once, but produces discrete packets that don't exceed a limit. Included is a benchmark of various techniques including encoding messages one at a time.

MOZGIII commented 6 months ago

This is, obviously, a workaround that is universal and well know, and it what I'm using currently. I'm interested in specifically bitcode to provide this functionality - not because there's no other way but rather because bitcode already has everything that is required to do it.

Thanks for sharing the bitcode_packet_packer.

My use case is passing data over WebTransport streams - and currently it is for an example app. It is, basically, sending a packet and waiting for a reply, very old fashioned state machine on both ends without the need to pack multiple messages at once.

I'm currently using the tokio_util::codec::length_delimited::LengthDelimitedCodec - but just that, without the rest of FramedCodec infrastructure, as we don't use tokio io types.