dtolnay / serde-yaml

Strongly typed YAML library for Rust
Apache License 2.0
960 stars 157 forks source link

Incorrect error position inside tagged enums #153

Open RazrFalcon opened 4 years ago

RazrFalcon commented 4 years ago
use serde::Deserialize;

#[serde(tag = "Type")]
#[derive(Deserialize, Debug)]
pub enum Node {
    One {

    },
    Two {
        value1: u32,
        value2: u32,
    },
}

fn main() {
    let text = "\
---
- Type: One
- Type: Two
  value1: 1
  value2: text
";

    let nodes: Vec<Node> = match serde_yaml::from_str(text) {
        Ok(v) => v,
        Err(e) => {
            eprintln!("{}", e);
            return;
        }
    };

    dbg!(nodes);
}

It prints:

invalid type: string "text", expected u32 at line 2 column 1

Which is completely incorrect. The line is 5 and column is 11.

torkleyy commented 3 years ago

Same issue for me, it seems to be caused by (tagged) enums.

hasezoey commented 3 years ago

Just hit this issue too, which was quite confusing

versions: rust: 1.55 serde: 1.0.130 serde_yaml: 0.8.21

PS: please ask if anything else needs to be provided

HEnquist commented 2 years ago

I also have hit this. Here is another test that shows it:

#[test]
fn test_tagged_location() {
    #[derive(Debug, Deserialize)]
    #[serde(deny_unknown_fields)]
    struct S {
        #[allow(dead_code)]
        a: usize,
        #[allow(dead_code)]
        b: bool,
        #[allow(dead_code)]
        c: E,
    }
    #[derive(Debug, Deserialize)]
    #[serde(tag = "type")]
    pub enum E {
        A {
            e: String,
            f: bool,
            g: f32,
        } 
    }

    let yaml = indoc! {"
        ---
        a: 2
        b: true
        c:
          type: A
          e: \"hello\"
          f: fase
          g: 1.0
    "};
    let expected =
        "invalid type: string \"fase\", expected a boolean at line 7 column 6";
    test_error::<S>(yaml, expected);
}

Output:

---- test_tagged_location stdout ----
thread 'test_tagged_location' panicked at 'assertion failed: `(left == right)`
  left: `"invalid type: string \"fase\", expected a boolean at line 7 column 6"`,
 right: `"invalid type: string \"fase\", expected a boolean at line 2 column 2"`',
ANtlord commented 2 years ago

I've done a little investigation. It appears that the problem is common for other formats. I did two almost identical tests for serde-yaml and serde-json and put panics at creation of the error to see the backtrace:

The test for serde-json

#[test]
fn test_enum_serde_tag() {
    #[derive(Deserialize, PartialEq, Debug)]
    #[serde(tag = "kind")]
    enum E {
        Circle { radius: u8 },
        Dot { x: u8, y: u8 },
    }

    #[derive(Deserialize, PartialEq, Debug)]
    struct Data {
        figures: Vec<E>,
    }

    let source = r#"{ "figures": [ { "kind": "Dot", "y": 22, "x": true } ] }"#;

    let deserialized = from_str::<Data>(source);
    assert!(deserialized.is_err());
    let err = deserialized.unwrap_err();
    assert_eq!(1, err.line(), "{:?}", err);
    assert_eq!(47, err.column(), "{:?}", err);
}

The actual value of the column is 54.

The test for serde-yaml

#[test]
fn test_enum_serde_tag() {
    #[derive(Deserialize, PartialEq, Debug)]
    #[serde(tag = "kind")]
    enum E {
        Circle { radius: u8 },
        Dot { x: u8, y: u8 },
    }

    #[derive(Deserialize, PartialEq, Debug)]
    struct Data {
        figures: Vec<E>,
    }

    let yaml = indoc! {"
        ---
        figures:
          - kind: Dot
            x: 21
            y: true
    "};

    let deserialized = serde_yaml::from_str::<Data>(yaml);
    assert!(deserialized.is_err());
    let err = deserialized.unwrap_err();
    let loc = err.location().unwrap();
    assert_eq!(5, loc.line(), "{:?}", err);
    assert_eq!(8, loc.column());
}

The actual value of the line is 3.

So let's take a look at the tracebacks.

The traceback for serde-json

thread 'test_enum_serde_tag' panicked at 'OMG!!, invalid type: boolean `true`, expected u8', src/error.rs:374:13
stack backtrace:
   0: backtrace::backtrace::libunwind::trace
             at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.46/src/backtrace/libunwind.rs:86
   1: backtrace::backtrace::trace_unsynchronized
             at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.46/src/backtrace/mod.rs:66
   2: std::sys_common::backtrace::_print_fmt
             at src/libstd/sys_common/backtrace.rs:78
   3: <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt
             at src/libstd/sys_common/backtrace.rs:59
   4: core::fmt::write
             at src/libcore/fmt/mod.rs:1076
   5: std::io::Write::write_fmt
             at /rustc/04488afe34512aa4c33566eb16d8c912a3ae04f9/src/libstd/io/mod.rs:1537
   6: std::io::impls::<impl std::io::Write for alloc::boxed::Box<W>>::write_fmt
             at src/libstd/io/impls.rs:176
   7: std::sys_common::backtrace::_print
             at src/libstd/sys_common/backtrace.rs:62
   8: std::sys_common::backtrace::print
             at src/libstd/sys_common/backtrace.rs:49
   9: std::panicking::default_hook::{{closure}}
             at src/libstd/panicking.rs:198
  10: std::panicking::default_hook
             at src/libstd/panicking.rs:214
  11: std::panicking::rust_panic_with_hook
             at src/libstd/panicking.rs:526
  12: rust_begin_unwind
             at src/libstd/panicking.rs:437
  13: std::panicking::begin_panic_fmt
             at src/libstd/panicking.rs:391
  14: <serde_json::error::Error as serde::de::Error>::invalid_type
             at src/error.rs:374
  15: serde::__private::de::content::ContentDeserializer<E>::invalid_type
             at /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/serde-1.0.136/src/private/de.rs:1007
  16: serde::__private::de::content::ContentDeserializer<E>::deserialize_integer
             at /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/serde-1.0.136/src/private/de.rs:1023
  17: <serde::__private::de::content::ContentDeserializer<E> as serde::de::Deserializer>::deserialize_u8
             at /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/serde-1.0.136/src/private/de.rs:1156
  18: serde::de::impls::<impl serde::de::Deserialize for u8>::deserialize
             at /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/serde-1.0.136/src/de/impls.rs:129
  19: <core::marker::PhantomData<T> as serde::de::DeserializeSeed>::deserialize
             at /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/serde-1.0.136/src/de/mod.rs:787
  20: <serde::de::value::MapDeserializer<I,E> as serde::de::MapAccess>::next_value_seed
             at /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/serde-1.0.136/src/de/value.rs:1181
  21: serde::de::MapAccess::next_value
             at /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/serde-1.0.136/src/de/mod.rs:1866
  22: <&mut A as serde::de::MapAccess>::next_value
             at /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/serde-1.0.136/src/de/mod.rs:1947
  23: <test::test_enum_serde_tag::_::<impl serde::de::Deserialize for test::test_enum_serde_tag::E>::deserialize::__Visitor as serde::de::Visitor>::visit_map
             at tests/test.rs:1377
  24: serde::__private::de::content::visit_content_map
             at /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/serde-1.0.136/src/private/de.rs:1071
  25: <serde::__private::de::content::ContentDeserializer<E> as serde::de::Deserializer>::deserialize_any
             at /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/serde-1.0.136/src/private/de.rs:1110
  26: test::test_enum_serde_tag::_::<impl serde::de::Deserialize for test::test_enum_serde_tag::E>::deserialize
             at tests/test.rs:1377
  27: <core::marker::PhantomData<T> as serde::de::DeserializeSeed>::deserialize
             at /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/serde-1.0.136/src/de/mod.rs:787
  28: <serde_json::de::SeqAccess<R> as serde::de::SeqAccess>::next_element_seed
             at ./src/de.rs:1943
  29: serde::de::SeqAccess::next_element
             at /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/serde-1.0.136/src/de/mod.rs:1727
  30: <serde::de::impls::<impl serde::de::Deserialize for alloc::vec::Vec<T>>::deserialize::VecVisitor<T> as serde::de::Visitor>::visit_seq
             at /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/serde-1.0.136/src/de/impls.rs:1038
  31: <&mut serde_json::de::Deserializer<R> as serde::de::Deserializer>::deserialize_seq
             at ./src/de.rs:1736
  32: serde::de::impls::<impl serde::de::Deserialize for alloc::vec::Vec<T>>::deserialize
             at /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/serde-1.0.136/src/de/impls.rs:1049
  33: <core::marker::PhantomData<T> as serde::de::DeserializeSeed>::deserialize
             at /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/serde-1.0.136/src/de/mod.rs:787
  34: <serde_json::de::MapAccess<R> as serde::de::MapAccess>::next_value_seed
             at ./src/de.rs:2002
  35: serde::de::MapAccess::next_value
             at /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/serde-1.0.136/src/de/mod.rs:1866
  36: <test::test_enum_serde_tag::_::<impl serde::de::Deserialize for test::test_enum_serde_tag::Data>::deserialize::__Visitor as serde::de::Visitor>::visit_map
             at tests/test.rs:1384
  37: <&mut serde_json::de::Deserializer<R> as serde::de::Deserializer>::deserialize_struct
             at ./src/de.rs:1835
  38: test::test_enum_serde_tag::_::<impl serde::de::Deserialize for test::test_enum_serde_tag::Data>::deserialize
             at tests/test.rs:1384
  39: serde_json::de::from_trait
             at ./src/de.rs:2412
  40: serde_json::de::from_str
             at ./src/de.rs:2613
  41: test::test_enum_serde_tag
             at tests/test.rs:1391
  42: test::test_enum_serde_tag::{{closure}}
             at tests/test.rs:1375
  43: core::ops::function::FnOnce::call_once
             at /rustc/04488afe34512aa4c33566eb16d8c912a3ae04f9/src/libcore/ops/function.rs:233
  44: <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once
             at /rustc/04488afe34512aa4c33566eb16d8c912a3ae04f9/src/liballoc/boxed.rs:1081
  45: <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
             at /rustc/04488afe34512aa4c33566eb16d8c912a3ae04f9/src/libstd/panic.rs:318
  46: std::panicking::try::do_call
             at /rustc/04488afe34512aa4c33566eb16d8c912a3ae04f9/src/libstd/panicking.rs:348
  47: std::panicking::try
             at /rustc/04488afe34512aa4c33566eb16d8c912a3ae04f9/src/libstd/panicking.rs:325
  48: std::panic::catch_unwind
             at /rustc/04488afe34512aa4c33566eb16d8c912a3ae04f9/src/libstd/panic.rs:394
  49: test::run_test_in_process
             at src/libtest/lib.rs:541
  50: test::run_test::run_test_inner::{{closure}}
             at src/libtest/lib.rs:450

The traceback for serde-yaml

thread 'test_enum_serde_tag' panicked at 'OMG!! invalid type: boolean `true`, expected u8', src/error.rs:190:9
stack backtrace:
   0: backtrace::backtrace::libunwind::trace
             at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.46/src/backtrace/libunwind.rs:86
   1: backtrace::backtrace::trace_unsynchronized
             at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.46/src/backtrace/mod.rs:66
   2: std::sys_common::backtrace::_print_fmt
             at src/libstd/sys_common/backtrace.rs:78
   3: <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt
             at src/libstd/sys_common/backtrace.rs:59
   4: core::fmt::write
             at src/libcore/fmt/mod.rs:1076
   5: std::io::Write::write_fmt
             at /rustc/04488afe34512aa4c33566eb16d8c912a3ae04f9/src/libstd/io/mod.rs:1537
   6: std::io::impls::<impl std::io::Write for alloc::boxed::Box<W>>::write_fmt
             at src/libstd/io/impls.rs:176
   7: std::sys_common::backtrace::_print
             at src/libstd/sys_common/backtrace.rs:62
   8: std::sys_common::backtrace::print
             at src/libstd/sys_common/backtrace.rs:49
   9: std::panicking::default_hook::{{closure}}
             at src/libstd/panicking.rs:198
  10: std::panicking::default_hook
             at src/libstd/panicking.rs:214
  11: std::panicking::rust_panic_with_hook
             at src/libstd/panicking.rs:526
  12: rust_begin_unwind
             at src/libstd/panicking.rs:437
  13: std::panicking::begin_panic_fmt
             at src/libstd/panicking.rs:391
  14: <serde_yaml::error::Error as serde::de::Error>::custom
             at src/error.rs:190
  15: serde::de::Error::invalid_type
             at /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/serde-1.0.136/src/de/mod.rs:209
  16: serde::__private::de::content::ContentDeserializer<E>::invalid_type
             at /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/serde-1.0.136/src/private/de.rs:1007
  17: serde::__private::de::content::ContentDeserializer<E>::deserialize_integer
             at /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/serde-1.0.136/src/private/de.rs:1023
  18: <serde::__private::de::content::ContentDeserializer<E> as serde::de::Deserializer>::deserialize_u8
             at /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/serde-1.0.136/src/private/de.rs:1156
  19: serde::de::impls::<impl serde::de::Deserialize for u8>::deserialize
             at /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/serde-1.0.136/src/de/impls.rs:129
  20: <core::marker::PhantomData<T> as serde::de::DeserializeSeed>::deserialize
             at /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/serde-1.0.136/src/de/mod.rs:787
  21: <serde::de::value::MapDeserializer<I,E> as serde::de::MapAccess>::next_value_seed
             at /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/serde-1.0.136/src/de/value.rs:1181
  22: serde::de::MapAccess::next_value
             at /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/serde-1.0.136/src/de/mod.rs:1866
  23: <&mut A as serde::de::MapAccess>::next_value
             at /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/serde-1.0.136/src/de/mod.rs:1947
  24: <test_de::test_enum_serde_tag::_::<impl serde::de::Deserialize for test_de::test_enum_serde_tag::E>::deserialize::__Visitor as serde::de::Visitor>::visit_map
             at tests/test_de.rs:183
  25: serde::__private::de::content::visit_content_map
             at /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/serde-1.0.136/src/private/de.rs:1071
  26: <serde::__private::de::content::ContentDeserializer<E> as serde::de::Deserializer>::deserialize_any
             at /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/serde-1.0.136/src/private/de.rs:1110
  27: test_de::test_enum_serde_tag::_::<impl serde::de::Deserialize for test_de::test_enum_serde_tag::E>::deserialize
             at tests/test_de.rs:183
  28: <core::marker::PhantomData<T> as serde::de::DeserializeSeed>::deserialize
             at /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/serde-1.0.136/src/de/mod.rs:787
  29: <serde_yaml::de::SeqAccess as serde::de::SeqAccess>::next_element_seed
             at ./src/de.rs:754
  30: serde::de::SeqAccess::next_element
             at /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/serde-1.0.136/src/de/mod.rs:1727
  31: <&mut A as serde::de::SeqAccess>::next_element
             at /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/serde-1.0.136/src/de/mod.rs:1756
  32: <serde::de::impls::<impl serde::de::Deserialize for alloc::vec::Vec<T>>::deserialize::VecVisitor<T> as serde::de::Visitor>::visit_seq
             at /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/serde-1.0.136/src/de/impls.rs:1038
  33: serde_yaml::de::DeserializerFromEvents::visit_sequence::{{closure}}
             at ./src/de.rs:589
  34: serde_yaml::de::DeserializerFromEvents::recursion_check
             at ./src/de.rs:679
  35: serde_yaml::de::DeserializerFromEvents::visit_sequence
             at ./src/de.rs:587
  36: <&mut serde_yaml::de::DeserializerFromEvents as serde::de::Deserializer>::deserialize_seq
             at ./src/de.rs:1331
  37: serde::de::impls::<impl serde::de::Deserialize for alloc::vec::Vec<T>>::deserialize
             at /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/serde-1.0.136/src/de/impls.rs:1049
  38: <core::marker::PhantomData<T> as serde::de::DeserializeSeed>::deserialize
             at /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/serde-1.0.136/src/de/mod.rs:787
  39: <serde_yaml::de::MapAccess as serde::de::MapAccess>::next_value_seed
             at ./src/de.rs:818
  40: serde::de::MapAccess::next_value
             at /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/serde-1.0.136/src/de/mod.rs:1866
  41: <&mut A as serde::de::MapAccess>::next_value
             at /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/serde-1.0.136/src/de/mod.rs:1947
  42: <test_de::test_enum_serde_tag::_::<impl serde::de::Deserialize for test_de::test_enum_serde_tag::Data>::deserialize::__Visitor as serde::de::Visitor>::visit_map
             at tests/test_de.rs:190
  43: serde_yaml::de::DeserializerFromEvents::visit_mapping::{{closure}}
             at ./src/de.rs:607
  44: serde_yaml::de::DeserializerFromEvents::recursion_check
             at ./src/de.rs:679
  45: serde_yaml::de::DeserializerFromEvents::visit_mapping
             at ./src/de.rs:601
  46: <&mut serde_yaml::de::DeserializerFromEvents as serde::de::Deserializer>::deserialize_struct
             at ./src/de.rs:1385
  47: <serde_yaml::de::Deserializer as serde::de::Deserializer>::deserialize_struct::{{closure}}
             at ./src/de.rs:424
  48: serde_yaml::de::Deserializer::de
             at ./src/de.rs:114
  49: <serde_yaml::de::Deserializer as serde::de::Deserializer>::deserialize_struct
             at ./src/de.rs:424
  50: test_de::test_enum_serde_tag::_::<impl serde::de::Deserialize for test_de::test_enum_serde_tag::Data>::deserialize
             at tests/test_de.rs:190
  51: <core::marker::PhantomData<T> as serde::de::DeserializeSeed>::deserialize
             at /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/serde-1.0.136/src/de/mod.rs:787
  52: serde_yaml::de::from_str_seed
             at ./src/de.rs:1496
  53: serde_yaml::de::from_str
             at ./src/de.rs:1477
  54: test_de::test_enum_serde_tag
             at tests/test_de.rs:210
  55: test_de::test_enum_serde_tag::{{closure}}
             at tests/test_de.rs:182
  56: core::ops::function::FnOnce::call_once
             at /rustc/04488afe34512aa4c33566eb16d8c912a3ae04f9/src/libcore/ops/function.rs:233
  57: <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once
             at /rustc/04488afe34512aa4c33566eb16d8c912a3ae04f9/src/liballoc/boxed.rs:1081
  58: <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
             at /rustc/04488afe34512aa4c33566eb16d8c912a3ae04f9/src/libstd/panic.rs:318
  59: std::panicking::try::do_call
             at /rustc/04488afe34512aa4c33566eb16d8c912a3ae04f9/src/libstd/panicking.rs:348
  60: std::panicking::try
             at /rustc/04488afe34512aa4c33566eb16d8c912a3ae04f9/src/libstd/panicking.rs:325
  61: std::panic::catch_unwind
             at /rustc/04488afe34512aa4c33566eb16d8c912a3ae04f9/src/libstd/panic.rs:394
  62: test::run_test_in_process
             at src/libtest/lib.rs:541
  63: test::run_test::run_test_inner::{{closure}}
             at src/libtest/lib.rs:450

The lines are slightly different because I've left some comment, but you can find the places in the code by the names of the backtrace frames. From both of backtraces you can see that the last point where an error is able to get the location in the file is in deserialize_seq

31: <&mut serde_json::de::Deserializer<R> as serde::de::Deserializer>::deserialize_seq
36: <&mut serde_yaml::de::DeserializerFromEvents as serde::de::Deserializer>::deserialize_seq

Further code visits internals of elements and delegates the deserialization of them to

serde::__private::de::content::ContentDeserializer

I didn't dive so deep, so I can't say what is the entity for. By this moment I have an idea to handle an error in MapAccess and SeqAccess. They know about marker which carries the location, but I don't know if it's right design solution. Any hints are appreciated.

azriel91 commented 1 year ago

heya all, I've been investigating this, and have gotten this far:

  1. For internally tagged enums, serde will use this ContentDeserializer, in particular visit_content_map.
  2. For the example in the first post, serde_yaml will use DeserializerFromEvents::deserialize_any, visit_scalar on the text value in the yaml to deserialize.
  3. Though it should eventually be a u32, serde needs to first store the entire map in memory so that it can look for the internal tag.
  4. For this to work, serde_yaml deserializes the text as a str in visit_untagged_scalar, and serde stores it as a Content::Str in TaggedContentVisitor::visit_map -- the v there is Content::Str("text").
  5. Note that the libyaml::Mark containing the correct position information of text is actually available in DeserializerFromEvents::deserialize_any:1201 (as seen in step 2).
  6. However, since serde needs to save all values and later check for the internal tag, we have lost that position information when we store just the "successfully" parsed str.
  7. Later on, when serde figures out that value2 should be a u32, it eventually calls ContentDeserializer::deserialize_integer, which fails because content is a Content::Str (as seen in step 4).
  8. Then there is no way for serde_yaml to correctly augment the error with the position, and DeserializerFromEvents::deserialize_seq only has the Mark (i.e. position) of the data that surrounds the internally tagged enum.

I suspect fixing this may need some changes in serde to store the position of each value in TaggedContentVisitor::visit_map's vec, and include it in Error::custom ?

Though that looks like a breaking change, perhaps there's another way.


Investigation notes: