bincode-org / bincode

A binary encoder / decoder implementation in Rust.
MIT License
2.69k stars 272 forks source link

MaybeUninit support #620

Closed tdelabro closed 1 year ago

tdelabro commented 1 year ago

When using bincode to encode in a buffer, I already knew the exact size of the array that would result (len_of_my_vec * size_of_an_encoded_element). My first reflex had been to create a Vec with capacity, but encode takes a &mut [u8] so it has no way to extend it. So I gave it the right size myself with:

 unsafe {
        buffer.set_len(encoded_payload_len);
    }

but clippy is yelling at me because of #[deny(clippy::uninit_vec). It wants me to use MaybeUninit instead (Vec<MaybeUninit<u8>>). The problem is that bincode does not provide an API that accepts &mut MaybeUninit<u8> as the buffer to write on.

Do you think it could be a good feature to add?

adamreichold commented 1 year ago

You could use encode_to_vec and rely on bincode doing the optimization (it does have this information after all). Or you could use encode_into_std_write rely on the fact that there exists an impl Write for Vec<u8>.

More broadly speaking, Vec<T> is already handling uninitialized memory that will be filled with instances of T for you, so adding MaybeUninit<T> on top of that is somewhat redundant, c.f. e.g. Vec::spare_capacity_mut.

Speaking more long-term, BorrowedCursor is the API which is supposed to handle avoiding to initialize buffers used for IO, as used in e.g. read_buf.

tdelabro commented 1 year ago

I'm calling encode in loop, so using encode_to_vec would generate n Vecs (one for each iteration of my loop) that I will then have to concatenate in my original Vec buffer, making the total memory allocated during the operation twice that of my known buffer size.

I'm compiling to no_std so I don't have access to encode_into_std_write.

My code look like this right now:

pub fn get_encoded_trace(
    relocated_trace: &[crate::vm::trace::trace_entry::RelocatedTraceEntry],
) -> Result<Vec<u8>, EncodeTraceError> {
    let encoded_payload_len = 24 * relocated_trace.len(); // `RelocatedTraceEntry` size is 24 bytes
    let mut buffer = Vec::with_capacity(encoded_payload_len);

    #[allow(clippy::uninit_vec)]
    unsafe {
        buffer.set_len(encoded_payload_len);
    }

    for (i, entry) in relocated_trace.iter().enumerate() {
        bincode::serde::encode_into_slice(
            entry,
            &mut buffer[24 * i..24 * (i + 1)],
            bincode::config::legacy(),
        )
        .map_err(|e| EncodeTraceError(i, e))?;
    }

    Ok(buffer)
}
VictorKoenders commented 1 year ago

We are not interested in supporting writing into Vec<MaybeUninit<u8>> or &mut MaybeUninit<u8>

I can recommend you pre-fill the buffer with 0u8 and then write to that, for example with Vec::resize

tdelabro commented 1 year ago

We are not interested in supporting writing into Vec<MaybeUninit> or &mut MaybeUninit

Why? Sound like a pretty standard way of using your library.

I can recommend you pre-fill the buffer with 0u8 and then write to that, for example with Vec::resize

Yeah I know I can. But it's a waste of computation, which is what I'm trying to avoid there.

VictorKoenders commented 1 year ago

Sound like a pretty standard way of using your library.

I've never heard anyone that needed this before

But it's a waste of computation, which is what I'm trying to avoid there.

Can you show me a benchmark in release mode with LTO enabled that proves that pre-filling the vec increases your programs execution speed unacceptably? I fully expect LLVM to generate the same code and it saves you a dozen footguns if you use less unsafe code

adamreichold commented 1 year ago

I'm compiling to no_std so I don't have access to encode_into_std_write.

So should be able to fall back to implementing bincode's own Write for (a wrapper of) Vec<u8> then and use encode_into_writer to append to the Vec from within your loop body.

tdelabro commented 1 year ago

So should be able to fall back to implementing bincode's own Write for (a wrapper of) Vec then and use encode_into_writer to append to the Vec from within your loop body.

Sound great! Thanks