bincode-org / bincode

A binary encoder / decoder implementation in Rust.
MIT License
2.63k stars 265 forks source link

Are initial "buffer size" bytes in bincode2 decode_from_slice really required? #714

Closed mcclure closed 3 months ago

mcclure commented 3 months ago

I don't know if this is a bug, a feature request or a support request…

I am porting a .xm file parser library¹ to support no_std. This means upgrading from bincode to bincode2. The files the library opens are not created with bincode, the format is pre-existing and the library uses bincode to deserialize a struct that has the same fields in the same order as the format. Serde #[attributes] are used to make sure the offsets and such all match². With bincode1 this just works.

To convert to bincode2, I have replaced calls to bincode::deserialize(&decoded_data)?) with calls to bincode::decode_from_slice(&decoded_data, bincode::config::legacy())?.0). It is no longer able to open the same files, and the reason why is bincode2, when decoding, seems to assume that an eight-byte size is present at the beginning of the data. I believe this becuase when I try to decode one of the files that bincode1 could open, it tries to allocate an 8 exabyte buffer and the program crashes. The size of the buffer it tries to allocate is exactly equal to the first eight bytes of the file in binary.

I don't find this eight-byte header in the documentation, and there doesn't seem to be a way to opt out of using it. I don't know if this is something that is already planned to be fixed as part of bringing bincode2 to be backward compatible with bincode 1.33. I don't know if I somehow caused this myself by using decode_with_slice instead of decode_from_reader (what we have is a slice however).

What should I do to decode a file that does not have the 8 byte size header?

We are embedding bincode with bincode = { version = "2.0.0-rc.3", features=["alloc", "derive", "serde"], default-features = false }.

¹ This link is a test program that can be used to reproduce, but it will be a little nontrivial to set up. You will have to check out the PRs for both xmrs and xmrsplayer with the PR branch, then replace the reference to xmrs in the xmrsplayer Cargo.toml with xmrs = { path = "/path/to/xmrs/checkout", default-features = false } so it is using your checked out version, then run the invocation under "I broke it" at the link. ² I think currently I have this set up wrong, but it doesn't matter because currently we aren't even getting as far as parsing the first field.

VictorKoenders commented 3 months ago

bincode2, when decoding, seems to assume that an eight-byte size is present at the beginning of the data

This is false, bincode only decodes what you tell it to, and bincode 1 and 2 should be identical. I'm guessing your parser uses custom serde deserializers to parse your non-bincode format. Can you try with bincode::serde::decode_from_slice?

already planned to be fixed as part of bringing bincode2 to be backward compatible with bincode 1.33

If bincode 2 does not behave the same as bincode 1, then that is considered a bug. This does not cover custom (de)serialize implementations obviously, as we cannot control user code.

The files the library opens are not created with bincode, the format is pre-existing [..]

Obligatory mention that bincode only supports its own format, and we do not cover any other format that happens to be similar

VictorKoenders commented 3 months ago

Specifically it seems like your AmigaSample calls deserialize_string_22 which skips the first 22 bytes. For bincode 2 you'd have to implement a custom Decode

mcclure commented 3 months ago

Specifically it seems like your AmigaSample calls deserialize_string_22 which skips the first 22 bytes. For bincode 2 you'd have to implement a custom Decode

I don't think this is correct because rather than skipping the first 22 bytes something is treating the first 8 bytes as a size. However with the information you've provided it seems very probable that the bug is in my code. Thanks!

I will investigate and either come back with a specific repro or close this. (Or you can close it now if you're totally confident this is user error.)

mcclure commented 3 months ago

Thank you for the help. After switching to bincode::serde::decode_from_slice, it worked perfectly correctly! I will close this now.

As a recommendation, I think I possibly could have understood how to do this correctly the first time if the migration guide had presented things a little differently. I have made a PR with a proposed change to the recommendation guide.