3Hren / msgpack-rust

MessagePack implementation for Rust / msgpack.org[Rust]
MIT License
1.17k stars 130 forks source link

rmp-serde: Failed to deserialize &[u8] #163

Open sudeep9 opened 6 years ago

sudeep9 commented 6 years ago

I have a basic question. I have the following code (rmp-serde version = 0.13.7):

#[derive(Debug, Deserialize, Serialize)]
struct Data<'a> {
    buf: &'a [u8]
}

fn codec() -> Result<(), Error> {
    let buf = b"hello";

    let d = Data{buf: buf.as_ref()};

    let mut outbuf = Vec::new();
    d.serialize(&mut Serializer::new_named(&mut outbuf))?;

    let inbuf = outbuf.as_slice();
    let mut de = Deserializer::from_slice(inbuf);
    let d2: Data = serde::Deserialize::deserialize(&mut de)?;
    println!("decoded data = {:?}", d2);

    Ok(())
}

The deserialization fails with error: Error: Syntax("invalid type: sequence, expected a borrowed byte array"). The code works if buf: &'a [u8] is changed to buf: &'a str.

Does &'a [u8] needs a different treatment?

mgxm commented 6 years ago

Hello @sudeep9 I resolved this issue just using the serde_bytes

misos1 commented 6 years ago

It should be default behaviour to serialise Vec in way that serde_bytes does. Without it bytes above 0x7F like 0xFDFEFF are serialised into messagepack fixarray (0x9X) [81, A1, 61, 93, CC, FD, CC, FE, CC, FF] instead of [81, A1, 61, C4, 3, FD, FE, FF] where is used bin 8 (0xC4), this means about 1.5x more size in serialised format than original binary data.

Using serde_bytes is more efficient but unfortunately seems still does not support deserialisation into &'a [u8].

jaxrtech commented 3 years ago

Not sure if this has been resolved, but figured better not to lead to the "Wisdom of the Ancients" scenario...

To work around not being able to deserialize into &'a [u8], I also used serde_bytes, but instead have two nearly-identical structs with the exception that:

From a Rust borrow checker perspective, it makes sense to some degree since when you're serializing, you only need to have a borrowed version (the message struct doesn't need to own the buffer). While, when you're deserializing, the message you get back with the buffer should own the buffer via a Vec<u8> otherwise it's not clear who owns the buffer.

Example snippet:

#[derive(Serialize)]
pub struct FooRef<'a> {
    #[serde(with = "serde_bytes")]
    pub buf: &'a [u8],
}

#[derive(Deserialize)]
pub struct FooOwned {
    #[serde(with = "serde_bytes")]
    pub buf: Vec<u8>,
}
misos1 commented 3 years ago

While, when you're deserializing, the message you get back with the buffer should own the buffer via a Vec otherwise it's not clear who owns the buffer.

It makes sense in some use cases but not in all. Why would not be clear who owns the buffer?

kornelski commented 3 years ago

By default it's not possible to deserialize anything into &[u8], because slices can't store any data.

You can make the struct borrow from the input it parses, and in Serde you need to add #[serde(borrow)] annotation to tell Serde to do it.

Also try Cow<'a, [u8]> if you want to use either borrowed or owned data.