3Hren / msgpack-rust

MessagePack implementation for Rust / msgpack.org[Rust]
MIT License
1.17k stars 130 forks source link

Can't deserialize entire file #317

Open StuartHadfield opened 2 years ago

StuartHadfield commented 2 years ago

I can't deserialize an entire file because the Deserializer does not implement into_iter as other serde libraries do.

How can I get around this?

Code thus far is:

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let file_path = "./src/foo.msgpack";
    let reader = BufReader::new(File::open(file_path).unwrap());
    let writer = BufWriter::new(File::create("./src/results.json").unwrap());

    let mut deserializer = rmp_serde::Deserializer::from_read(reader);

    // let mut serializer = serde_json::Serializer::new(io::stdout());
    let mut serializer = serde_json::Serializer::pretty(writer);

    serde_transcode::transcode(&mut deserializer, &mut serializer).unwrap();
    serializer.into_inner().flush().unwrap();

    Ok(())
}
kornelski commented 2 years ago

How can I get around this?

Make a PR that adds into_inner

StuartHadfield commented 2 years ago

@kornelski 🤔 do you mean into_iter, not into_inner?

StuartHadfield commented 2 years ago

(I'm happy to have a bash, but I'm a real newbie to Rust, so not sure I'll manage haha)

kornelski commented 2 years ago

I assume you mean into_inner, because Iterator doesn't make sense here.

StuartHadfield commented 2 years ago

Ah... Hmmm 🤔 What does into_inner look like?

I thought making an iterator - because that seems to be how Python's msgpack implementation works (https://github.com/msgpack/msgpack-python/blob/500a238028bdebe123b502b07769578b5f0e8a3a/msgpack/_unpacker.pyx#L539-L540).

into_inner conventionally just returns the wrapped object, right? So we'd return the Reader? Which means we can...?

Also - into_inner is already implemented for Deserializer

kornelski commented 2 years ago

In that case I'm completely confused about what you want.

Serde fundamentally creates a single object of a given type. There is nothing to iterate in the decoder. Even if you deserialize a vector, you iterate the vector, not the decoder.

I thought you meant into_inner that returns the io::Reader so that you can recycle it for other I/O operations. That's not related to iteration.

StuartHadfield commented 2 years ago

Ah - okay - let me clarify.

If you have serialized the following array of objects into msgpack:

{
  "foo": "bar"
},
{
  "lorem": "ipsum"
}

We should be able to read all of them - out of a file stream. However, once serde_rmp reaches the end of the first object (probably some delineating character?), it concludes decoding, despite the fact there's loads of information still to be read out of the buffer. You can actually see this if you print out the bytes read by fs::read vs what's decoded by rmp_serde.

I thought about into_iter after seeing it in the json implementation of serde - https://docs.rs/serde_json/latest/serde_json/de/struct.Deserializer.html#method.into_iter.

Does that make any more sense @kornelski ?

kornelski commented 2 years ago

I don't think that's a correct usage of serde. Serde is a type-based one-shot deserializer, not a streaming deserializer. It gives you one and exactly one object of the type you've requested. If you've requested a single struct, that's all you will ever get. Two objects next to each other is not a type. If you have multiple objects to deserialize with serde, the deserialize them all into a single Vec<Object>.