3Hren / msgpack-rust

MessagePack implementation for Rust / msgpack.org[Rust]
MIT License
1.11k stars 123 forks source link

Fails to deserialize adjacent enum unit and struct variants #250

Open chipsenkbeil opened 4 years ago

chipsenkbeil commented 4 years ago

Description

I believe that I have encountered some bugs when attempting to convert from msgpack back into a Rust enumeration when using adjacent tagging. This is not a problem for msgpack if I only do inner tagging.

I've attached a fully working, minimal example where the tests can highlight working and failing scenarios: msgpack-rust-bug.zip

I've also highlighted most relevant parts below including the version of serde and rmp-serde used as well as highlighting working adjacent serialization and deserialization with JSON.

Cargo.toml:

[dependencies]
serde = { version = "1.0.111", features = ["derive"] }
serde_json = { version = "1.0.48" }
rmp-serde = "0.14.3"

Example Setup

For code like this:

use serde::{Deserialize, Serialize};

#[derive(Serialize, Deserialize, Clone, Debug, PartialEq, Eq)]
#[serde(tag = "type", content = "payload")]
pub enum Example {
    Unit1,
    Unit2,
    HasValue { x: u32 },
    TupleWithValue(u32, u32),
    InnerValue(SomeInnerValue),
}

#[derive(Serialize, Deserialize, Clone, Debug, PartialEq, Eq)]
pub struct SomeInnerValue {
    pub a: u32,
    pub b: String,
}

Struct Variant Failure

let v = rmp_serde::to_vec(&Example::HasValue { x: 3 }).unwrap();

// Fails unwrap with Syntax("invalid type: sequence,
// expected struct variant Example::HasValue")
let ex: Example = rmp_serde::from_slice(&v).unwrap();
assert_eq!(Example::HasValue { x: 3 }, ex);

Unit Variant Failure

let v = rmp_serde::to_vec(&Example::Unit1).unwrap();

// Fails unwrap with Syntax("invalid length 1,
// expected adjacently tagged enum Example")
let ex: Example = rmp_serde::from_slice(&v).unwrap();
assert_eq!(Example::Unit1, ex);

Success with serde json

let json_text = "{\"type\":\"Unit1\"}";
let ex: Example = serde_json::from_str(json_text).unwrap();
let output = serde_json::to_string(&ex).unwrap();
assert_eq!(json_text, output);

let v = serde_json::to_vec(&Example::HasValue { x: 3 }).unwrap();
let ex: Example = serde_json::from_slice(&v).unwrap();
assert_eq!(Example::HasValue { x: 3 }, ex);

let v = serde_json::to_vec(&Example::TupleWithValue(1, 2)).unwrap();
let ex: Example = serde_json::from_slice(&v).unwrap();
assert_eq!(Example::TupleWithValue(1, 2), ex);
kornelski commented 4 years ago

Yes, this crate's handling of enums is not as robust as serde-json's. It would be great if you could have a look at the code and find where it's going wrong.

chipsenkbeil commented 4 years ago

@kornelski, cloned the repo to add in the tests and poke around a little bit. So far, I've just been observing from the outside. It seems that the tagging switches communicating unit variants as single element fixmaps with a positive fixint key and nil value to a single-element fixarray with a fixstr representing the variant selected.

My guess is that with the standard version of enums, the type is included externally and thereby with msgpack not used at all. With inner tagging or adjacent tagging, the type is included inside and is being passed as this string version in an array. Given that there's nothing to indicate which form of tagging is being used, I'm assuming this would cause problems when trying to deserialize? Don't have an understanding of how the serde implementation works here yet.

[UPDATE] There is specific variations in serde based on the type of tagging used, defaulting to externally tagged configuration. It seems that a lot of the serialization references don't apply if not using external tagging. As seen here, adjacently tagged for a unit variant uses a struct serialization followed by a field serialization.

In the example below, it appears to invoke:

serde_cbor is also able to handle adjacent tagging correctly, although I believe this may only be for the unpacked case. You can serialize packed (where it converts variants and fields to integers) and unpacked (where fields/variants are kept as strings). Given that msgpack normally serializes variants as integer indexes, maybe this alternative encoding for adjacent (to try to include the type and content) is causing confusion in what is expected?


Without the tagging:

    #[derive(Serialize)]
    enum Enum {
        V1,
        V2,
    }

    let mut buf = Vec::new();
    Enum::V1.serialize(&mut Serializer::new(&mut buf)).unwrap();
    Enum::V2.serialize(&mut Serializer::new(&mut buf)).unwrap();

    // Expect: {0 => nil} {1 => nil}.
    assert_eq!(vec![0x81, 0x00, 0xC0, 0x81, 0x01, 0xC0], buf);

With the tagging:

    #[derive(Serialize)]
    #[serde(tag = "t", content = "c")]
    enum Enum {
        V1,
        V2,
    }

    let mut buf = Vec::new();
    Enum::V1.serialize(&mut Serializer::new(&mut buf)).unwrap();
    Enum::V2.serialize(&mut Serializer::new(&mut buf)).unwrap();

    // Expect: []
    // 0x91 == single element array
    // 0xA2 == fixed string of 2 characters
    //
    // 0x56 == V
    // 0x31 == 1
    // 0x32 == 2
    assert_eq!(vec![0x91, 0xA2, 0x56, 0x31, 0x91, 0xA2, 0x56, 0x32], buf);
Newbytee commented 3 years ago

@chipsenkbeil Are you currently working on attempting to solve this?

chipsenkbeil commented 3 years ago

@Newbytee, no, after sharing my debugging, I switched to a different serialization due to time constraints.

Newbytee commented 3 years ago

@Newbytee, no, after sharing my debugging, I switched to a different serialization due to time constraints.

Thanks for the swift reply. I might look into this in that case. Did you switch to a different serialisation format or a different serialiser? If the latter, which?

chipsenkbeil commented 3 years ago

@Newbytee, no, after sharing my debugging, I switched to a different serialization due to time constraints.

Thanks for the swift reply. I might look into this in that case. Did you switch to a different serialisation format or a different serialiser? If the latter, which?

I switched to serde_cbor while still using the adjacent tagging as I was able to drop it in quickly and keep moving with my project. :)

Kobzol commented 3 years ago

I looked into this and the problem seems to be rather in the interaction of this library with serde. The problem is that serde has several assumptions about deserializing adjacently tagged enums, and this library breaks them. I created an issue to see if this can be resolved in serde.

In the meantime, as a workaround you can try to use different serialization configs that encode structs as maps and not sequences. The easiest way how to achieve that is to use rmp_serde::to_vec_named instead of rmp_serde::to_vec.