dtolnay / serde-yaml

Strongly typed YAML library for Rust
Apache License 2.0
964 stars 164 forks source link

TaggedValue may serialize to map, eats up to two !! while roundtripping #364

Closed jcaesar closed 1 year ago

jcaesar commented 1 year ago

The following

Value::Tagged(Box::new(TaggedValue { Tag::new(""), value: "bar".into() }))

serializes to

'!': bar

which I found most surprising.

Tags that start with at least two ! correctly create a tagged value when serialized, but round-tripping them through to_stringfrom_str loses two !.

Quick reproducer ```rust fn main() { use serde_yaml::{value::*, *}; for pfx in 0..10 { let tag = std::iter::repeat('!').take(pfx).collect::(); let orig = Value::Tagged(Box::new(TaggedValue { tag: Tag::new(&tag), value: "bar".into(), })); let s = &to_string(&orig).unwrap(); let trip = from_str::(s).unwrap(); dbg!((tag, orig, s, trip)); } // These are somehow deserialized as expected...? dbg!(from_str::("! 42")).ok(); dbg!(from_str::("!%21 42")).ok(); } ```

Strangely, this is easy to fix

diff --git a/src/value/de.rs b/src/value/de.rs
index af660ee..addb95c 100644
--- a/src/value/de.rs
+++ b/src/value/de.rs
@@ -109,7 +109,10 @@ impl<'de> Deserialize<'de> for Value {
             where
                 A: EnumAccess<'de>,
             {
-                let (tag, contents) = data.variant::<String>()?;
+                let (mut tag, contents) = data.variant::<String>()?;
+                if tag.starts_with('!') {
+                    tag.insert(0, '!');
+                }
                 let tag = Tag::new(tag);
                 let value = contents.newtype_variant()?;
                 Ok(Value::Tagged(Box::new(TaggedValue { tag, value })))
diff --git a/src/value/tagged.rs b/src/value/tagged.rs
index 7c2f88a..b3e63f1 100644
--- a/src/value/tagged.rs
+++ b/src/value/tagged.rs
@@ -422,7 +422,7 @@ where
     fmt::write(&mut check_for_tag, format_args!("{}", value)).unwrap();
     match check_for_tag {
         CheckForTag::Empty => MaybeTag::NotTag(String::new()),
-        CheckForTag::Bang => MaybeTag::NotTag("!".to_owned()),
+        CheckForTag::Bang => MaybeTag::Tag("!".to_owned()),
         CheckForTag::Tag(string) => MaybeTag::Tag(string),
         CheckForTag::NotTag(string) => MaybeTag::NotTag(string),
     }

but the code makes me think that this behavior may be intentional?

Side note: I found this issue while trying to fuzz my own YAML serializer. But ran into a few other blocks, e.g.

{ { null: true, {}: false }: 16541390480555511807, "": { "": { { "": null, null: { null: null, { {}: "\u0000\u05ce0\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006\u0006yR\u0000\u05ce", { { null: { "\u0000ӎ厎\u05ce": { null: null, true: null, "\u05ce": { { {}: null }: true, true: "" } }, "": { { { { { { { { { { { { { { { { { { { { { null: null }: null }: null }: null }: null }: null }: null }: null }: null }: null }: null }: null }: null }: null }: null }: null }: null }: null }: null }: null }: null } } }: null }: null }: null } }: null } }, null: null }

failing to parse despite being seemingly valid. Thus, I've kind of given up on this approach, and I'm not really invested in either of these problems getting fixed. Still, I figured I may as well report them.

dtolnay commented 1 year ago

I think this is a consequence of libyaml's tag implementation. Libyaml parses !aaa 42 and !<!aaa> 42 identically as both having a tag value of "!aaa", even though these mean different things. The first one would typically need to be processed as type "aaa" while the second one explicitly means "!aaa".

I don't plan to look into this further since round-tripping arbitrary data through tags is not an intended use of YAML tags.

For the side note: that is not a valid YAML 1.1 document because it contains a flow map implicit key longer than 1024 characters. The 1024 character restriction is lifted in YAML 1.2 but that change is not implemented yet by libyaml. They say it's hard to implement.