birkenfeld / serde-pickle

Rust (de)serialization for the Python pickle format.
Apache License 2.0
188 stars 28 forks source link

Serializing tuple of containing byte array #10

Closed clegaard closed 3 years ago

clegaard commented 3 years ago

I am experiencing an issue when trying to deserialize a tuple containing a bytes type object. Running the following code fails:

Python

import pickle

with open("data.pickle", "wb") as f:
    bytes = pickle.dumps(1)
    pickle.dump((bytes, 1), f)

Rust

use serde;
use serde::Deserialize;
use serde_pickle::from_reader;

use std::fs::File;

fn main() {
    let data: (String, i32) =
        serde_pickle::from_reader(File::open("data.pickle").unwrap()).unwrap();
    println!("{:?}", data);
}

Error

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Syntax(Structure("invalid value: byte array, expected a string"))', src\main.rs:9:9

Oddlly, it appears to be working for a byte literal. Running the same program with the output from the script below yields:

import pickle

with open("data.pickle", "wb") as f:
    pickle.dump((b"hello world!", 1), f)
("hello world!", 1)
birkenfeld commented 3 years ago

This is expected; bytes objects generally need to be unpickled as Vec<u8> on the Rust side. Your bytes from the first Python example are not valid UTF-8. There is a special case for bytes objects that are valid UTF-8 like in the second example, for compatibility with Python 2 where the bytes/str/unicode distinction was quite muddy.

clegaard commented 3 years ago

First of all thank you for the quick response. My initial attempt was to use Vec, however this seems to produce an error was well, when deserializing:

with open("data.pickle", "wb") as f:
    bytes = pickle.dumps(1)
    pickle.dump((bytes, 1), f)
fn main() {
    let data: (Vec<u8>, i32) =
        serde_pickle::from_reader(File::open("data.pickle").unwrap()).unwrap();
    println!("{:?}", data);
}
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Syntax(Structure("invalid type: byte array, expected a sequence"))', src\main.rs:9:9

In fact the same thing happens when i try to send the only the bytes:

pickle.dump(pickle.dumps(1), f)
birkenfeld commented 3 years ago

Oh wow, that surprised me.

But it seems that it is expected behavior since serde can't special-case a Vec<u8> compared to, e.g. Vec<u32> and expects a list/tuple there.

You have to use a wrapper type which is enabled for deserializing the bytes of serde's data model. The serde_bytes crate provides one, which I verified works. So instead of Vec<u8> you put serde_bytes::ByteBuf.

I'll have to note this in the docs for serde-pickle though.

birkenfeld commented 3 years ago

Ok, this should make it clear. Thanks for the report!