Knowing the amount of bits that will be written

cBournhonesque commented 1 year ago

Hi,

I'd like to use bitcode for games networking; and it would be useful to have a function to know how many bits/bytes a structure would take if it were encoded, but without doing the actual encoding (so that i know in which packet i can put the encoded data).

finnbear commented 1 year ago

Unlike bincode, bitcode doesn't support serializing into a mutable packet structure or stream because performance would suffer from lack of alignment/wide-integer instructions. bitcode only serializes into Vec<u8> (via allocation) or &[u8] (via &mut bitcode::Buffer).

As a result, the minimal-allocation method is to reuse a bitcode::Buffer (or pool of them) and copy from the resulting &[u8] into your packet, at which point you know the number of bytes from <&[u8]>::len().

Feel free to give other/more specific reasons to implement this functionality, e.g. a code example, taking into account the above limitations.

cBournhonesque commented 1 year ago

I'm not sure I fully understood your comment; what I meant was a trait like this: https://github.com/naia-lib/naia/blob/main/shared/serde/src/serde.rs#L4

Where there could be an additional function that simply returns the amount of bytes that the struct/enum will be serialized into, but without doing the actual serialization. For example via these kinds of implementations: https://github.com/naia-lib/naia/blob/main/shared/serde/src/impls/string.rs#L28

finnbear commented 1 year ago

For example via these kinds of implementations: https://github.com/naia-lib/naia/blob/main/shared/serde/src/impls/string.rs#L28

Thanks for providing a code example! It looks like you are using the bit length to decide whether to serialize the message at all, which could legitimately benefit from the functionality.

(Edit: FWIW, I tried implementing the desired functionality on the predict_len branch).

caibear commented 1 year ago

I avoided adding something similar to bincode::serialized_size since I've noticed lots of people misuse it to allocate buffers with capacity as an optimization. This usually results in half the performance and double the binary size for everything but the most trivial structures (see https://github.com/bincode-org/bincode/issues/401).

it would be useful to have a function to know how many bits/bytes a structure would take if it were encoded, but without doing the actual encoding (so that i know in which packet i can put the encoded data).

I would advise serializing each structure to a Vec<u8> with bitcode::encode and then appending as many as possible to another Vec<u8>, each with a length prefix such as a u16 or u32. The length prefix is required so you can pass a &[u8] of the original structure length to bitcode::decode.

While copying the bytes isn't ideal, it should be much faster than something like serialized_size.

finnbear commented 1 year ago

@caibear brings up some good points against implementing this and a possible alternative for your code.

Here is one more possible alternative for you, in the form of code that you can drop in to your project:

    use std::cell::RefCell;
    use serde::Serialize;
    use bitcode::{Encode, Buffer, Error};

    // for serde::Serialize
    fn serialize_len<T: Serialize + ?Sized>(t: &T) -> Result<usize, Error> {
        thread_local! {
            static BUFFER: RefCell<Option<Buffer>> = RefCell::new(None);
        }

        BUFFER.with(|buffer| {
            let mut buffer = buffer.borrow_mut();
            if buffer.is_none() {
                *buffer = Some(Default::default());
            }
            buffer.as_mut().unwrap().serialize(t).map(|bytes| bytes.len())
        })
    }

    // for bitcode::Encode
    fn encode_len<T: Encode + ?Sized>(t: &T) -> Result<usize, Error> {
        thread_local! {
            static BUFFER: RefCell<Option<Buffer>> = RefCell::new(None);
        }

        BUFFER.with(|buffer| {
            let mut buffer = buffer.borrow_mut();
            if buffer.is_none() {
                *buffer = Some(Default::default());
            }
            buffer.as_mut().unwrap().encode(t).map(|bytes| bytes.len())
        })
    }

Use these as a last resort if you can't refactor your code as suggested by @caibear. By reusing the Buffer, they avoid repeated memory allocations. They don't require additional codegen and won't be significantly slower than my predict_len changes mentioned above.

cBournhonesque commented 1 year ago

Thank you! In general i'll be encoding everything in a buffer of size UDP_PACKET_SIZE (around 1400 bytes), so i wouldn't be using this to optimize allocations. Both options that you provided make sense to me.

SoftbearStudios / bitcode

Knowing the amount of bits that will be written #11