birkenfeld / serde-pickle

Rust (de)serialization for the Python pickle format.
Apache License 2.0
185 stars 27 forks source link

Strange handling of enums (decoding error: enums must be tuples) #9

Closed Palladinium closed 3 years ago

Palladinium commented 4 years ago

serde_pickle handles serde's default externally-tagged enum representation differently from other serde libraries I've encountered, which can cause issues when translating the same data between multiple serialization formats.

For example, I took serde_yaml and serde_json and used their to_string function, while for serde_pickle I pickled with the pickle 3 format, unpickled in Python 3 and put the result through print().

#[derive(Serialize)]
enum Foo {
    Struct { a: i32 },
    NewType(i32),
    Tuple(i32, u32),
    Unit,
}

fn values() -> Vec<Foo> {
    vec![
        Foo::Struct { a: 1 },
        Foo::NewType(2),
        Foo::Tuple(3, 4),
        Foo::Unit,
    ]
}

JSON

[{"Struct":{"a":1}},{"NewType":2},{"Tuple":[3,4]},"Unit"]

YAML

---
- Struct:
    a: 1
- NewType: 2
- Tuple:
    - 3
    - 4
- Unit

serde-pickle

[('Struct', {'a': 1}), ('NewType', 2), ('Tuple', [3, 4]), ('Unit',)]

What I'd expect serde-pickle to do

I think serde-pickle should follow the example of serde_yaml and serde_json, and (de)serialize like:

[{'Struct':{'a':1}},{'NewType':2},{'Tuple':[3,4]},'Unit']

Furthermore, if you use a python YAML or JSON library to load the above YAML or JSON examples, it'll produce the same output I'd expect serde-pickle to emit. (This is actually my use case, where a python script and a Rust program exchange data that is also distributed as YAML files).

birkenfeld commented 4 years ago

You're right, this is strange. I don't remember a reason for using a different method here.

For compatibility reasons, I'm not eager to change the serialize side, but it is no problem to accept both formats on the deserialize side. Would that be ok for your use case?

Palladinium commented 4 years ago

I think that would be a nice improvement, and it's completely fair to want to avoid breaking changes, but unfortunately I need to do both serialization and deserialization on the Rust side. Perhaps using a feature gate, or holding the serialize change until a new major version would make it less disruptive?

Either way, I'm mainly just using pickle since it's extremely easy to use on the Python side but I actually have the choice of using another format, so I should be able to get by with JSON.

birkenfeld commented 4 years ago

Yes, I may switch it for 1.0, while also changing the serializing APIs from a single bool option (for proto 3) to an extensible struct.

m-dewanchand commented 3 years ago

Ran into the same issue. Solved it by adding a serde container attribute to the enum: #[serde(into = "String")] and a from for String method. Example:

#[derive(Debug, PartialEq, Copy, Clone, Serialize)]
#[serde(into = "String")]
pub enum Feature {
    CrowdSim,
    Traffic,
    Debug,
    Demo,
    Json,
}

impl From<Feature> for String {
    fn from(src: Feature) -> String {
        format!("{:?}", src)
    }
}
birkenfeld commented 3 years ago

Ok, the new default is now implemented; if you are still interested it would be nice if you could test out master before I release 1.0.

Palladinium commented 3 years ago

I only did some simple testing, but it looks good. Thank you!

m-dewanchand commented 3 years ago

Same here, looks good. Thanks!

birkenfeld commented 3 years ago

Cool, thanks! I'll go ahead with the release then.

m-dewanchand commented 3 years ago

if possible we'd like to send 2 more patches.....can you wait? Hope to send the patches today.

birkenfeld commented 3 years ago

Too late - will they be backwards compatible?

m-dewanchand commented 3 years ago

I think so....

birkenfeld commented 3 years ago

Ok, depending on the timing it'll have to wait about a week since I'm going on vacation tomorrow. But then I can release that as 1.1.