RReverser / serde-xml-rs

xml-rs based deserializer for Serde (compatible with 1.0+)
https://crates.io/crates/serde-xml-rs
MIT License
270 stars 90 forks source link

Deserialize with xml header #47

Closed kjeremy closed 7 years ago

kjeremy commented 7 years ago

I'm trying to deserialize xml that starts with:

<?xml version=”1.0” standalone=”yes”?>

but it fails. Is there a way to ignore the header?

oli-obk commented 7 years ago

Not yet, but this is trivial to implement in the parser. Maybe we should bail out with an error if standalone is "no"

kjeremy commented 7 years ago

I'm looking at this right now (I'm a total rust newbie) and I don't understand why it doesn't work.

The following in inner_next should handle it:

match r.map_err(ErrorKind::Syntax)? {
                XmlEvent::StartDocument { .. } |
oli-obk commented 7 years ago

That seems too general. You'd skip over every kind of syntax error. I also am not sure if xml-rs even supports recovering from syntax errors. Have you checked whether xml-rs supports headers in some way?

RReverser commented 7 years ago

xml-rs definitely supports <?xml ...?> if that's what you mean by headers, it's in its own test suite: https://github.com/netvl/xml-rs/blob/master/tests/documents/sample_1.xml

RReverser commented 7 years ago

And it's returned as StartDocument, which we already ignore. @kjeremy please provide more details - full reproducible example and an error message you're getting would be helpful.

kjeremy commented 7 years ago

Right it returns StartDocument which is why I'm surprised. The following test illustrates the problem and fails with a parsing error.

#[test]
fn ignore_header() {
    init_logger();

    let s = r#"
        <?xml version=”1.0” standalone=”yes”?>
        <item name="hello" source="world.rs" />
    "#;

    let item: Item = from_str(s).unwrap();

    assert_eq!(
        item,
        Item {
            name: "hello".to_string(),
            source: "world.rs".to_string(),
        }
    );
}

thread 'ignore_header' panicked at 'called Result::unwrap() on an Err value: Error(Syntax(Error { pos: 3:9, kind: Syntax("Unexpected token inside attribute value: <") }), State { next_error: None, backtrace: None })', src\libcore\result.rs:906:4

RReverser commented 7 years ago

@kjeremy If it's literally your XML, then the problem appears to be these weird quotes you have in the test - (U+201D) in <?xml ...?> vs normal quote " (U+0022) everywhere else. Perhaps copied incorrectly from somewhere?

kjeremy commented 7 years ago

Ah ha! Good catch. Yes I had originally copied some XML from a word doc. Replacing the quotes fixed the issue. I will close this.