aaubry / YamlDotNet

YamlDotNet is a .NET library for YAML
MIT License
2.55k stars 480 forks source link

Incorrect UTF-32 character JSON serialization #997

Open nahk-ivanov opened 4 hours ago

nahk-ivanov commented 4 hours ago

Describe the bug When using JSON-compatible serializer (new SerializerBuilder().JsonCompatible()), it produces \Uxxxxxxxx for UTF-32 special characters (https://github.com/aaubry/YamlDotNet/blob/master/YamlDotNet/Core/Emitter.cs#L1192-L1193), which seems to be against the JSON spec, which expects two 2-byte code points instead: \uxxxx\uxxxx (also note lower-case u).

This prevents the JSON from being parsed as such by Newtonsoft.JSON library or anything else.

To Reproduce

using YamlDotNet.Serialization;

var yamlObject =
    new Deserializer().Deserialize("test:\n  - sea life \U0001F99E");

var serializer = new SerializerBuilder().JsonCompatible().Build();

Console.WriteLine(serializer.Serialize(yamlObject));

Actual:

{"test": ["sea life \U0001F99E"]}

Expected:

{"test": ["sea life \u0001\uF99E"]}
EdwardCooke commented 3 hours ago

I’d have to check the yaml spec for utf 32, if there even is a spec for utf32. but that would probably be a simple change for serialiazing. Not sure about deserializing though. We do accept pr’s so if you want to submit one with applicable tests then I would be happy to merge it in. I was hoping to get a new version out last week but didn’t so I’m hoping for this weekend.

nahk-ivanov commented 3 hours ago

I think deserialization was already fixed here: https://github.com/aaubry/YamlDotNet/issues/838