aaubry / YamlDotNet

YamlDotNet is a .NET library for YAML
MIT License
2.54k stars 477 forks source link

Serialization/Deserialization vs strings with funny characters #846

Closed mishun closed 1 year ago

mishun commented 1 year ago

Hello again!

Unlike #845, following works with JsonCompatible() but fails without it:

using System.Diagnostics;
using YamlDotNet.Serialization;

public class Program
{
    public static void Main(string[] argv)
    {
        var ser = new SerializerBuilder().Build();
        var des = new DeserializerBuilder().Build();

        var src = "~";
        var yaml = ser.Serialize(src);
        var dst = des.Deserialize<string>(yaml);

        Debug.Assert(src == dst);
    }
}

It also fails for "\r" and "\t" strings.

Tested with:

<PackageReference Include="YamlDotNet" Version="13.3.1" />
EdwardCooke commented 1 year ago

For the tilde that is a representation of null. It’s very possible it will return back an empty string or one with an empty line.

The other two you mentioned is white space that will also be treated as nothing so the result would be an empty string.

that being said I haven’t actually ran it on a computer yet so I’m not 100% certain. What are you seeing and expecting?

mishun commented 1 year ago

Yes, des.Deserialize<string>("~") == null.

Currently:

var ser = new SerializerBuilder().Build();
Debug.Assert(ser.Serialize("~") == "~\n");
Debug.Assert(ser.Serialize("\t") == "\t\n");
Debug.Assert(ser.Serialize("\r") == ">2\n\n");

I'd expect something more like:

Debug.Assert(ser.Serialize("~") == "\"\\x7E\"\n");
Debug.Assert(ser.Serialize("\t") == "\"\\t\"\n");
Debug.Assert(ser.Serialize("\r") == "\"\\r\"\n");

perhaps? Or for tilde to be encoded just as tilde and null string --- with explicit tag?

EdwardCooke commented 1 year ago

When you instantiate the serializerbuilder cal withquotenecessarystrings.

mishun commented 1 year ago

Thank you! That's much better:

var ser = new SerializerBuilder().WithQuotingNecessaryStrings().Build();
Console.WriteLine(ser.Serialize("\x0d"));

-->

"\r"

Unfortunately:

var ser = new SerializerBuilder().WithQuotingNecessaryStrings().Build();
Console.WriteLine(ser.Serialize("\x0d\x61"));

--->

>2-

  a

(0x61 is a code for 'a')

If you're wondering where do I get these annoying examples, I'm using QuickCheck variation for .NET.

EdwardCooke commented 1 year ago

Now that my laptop is up and running again I got to look at this. I was able to narrow it down to differences in line endings. Windows is (0A0D) and Linux is (0A). Using 0D is also valid, but things will get weird, as you're seeing.

Using the correct line endings things work as expected since it deserializes to the correct value.

End

The reason we output \r when that's the only character is due to the underlying emitter, when the scalar type is quoted (which quote necessary strings will set a string containing only line breaks, whitespace and other special characters) it will replace the special characters with escape codes. You can see where it does the escaping and what characters are escaped here https://github.com/aaubry/YamlDotNet/blob/1a73db760dc440569cfbd86d1d7e11d59cfcdb8a/YamlDotNet/Core/Emitter.cs#L1109

Since the second test, with the letter a on it, doesn't need to be quoted then it can be output using an empty line at the beginning as you saw.

To force it to default to double quoting where your use case will always pass, you can use .WithDefaultScalarStyle(YamlDotNet.Core.ScalarStyle.DoubleQuoted) instead of QuoteNecessaryStrings() like I suggested on the SerializerBuilder and it will pass. But, everything will default to double quotes and entries with new lines will become difficult to read.

If you want to only apply this to a specific property/field on an object, you can use the YamlMember attribute on that property/field and set the ScalarStyle to DoubleQuoted.

Here's the code I used to validate that this will work with 0A and 0D line endings.

using YamlDotNet.Serialization;

var str = new[] { "\x0a", "\x0a\x61", "\x0d", "\x0d\x61" };

Console.WriteLine("============================");
Console.WriteLine("Testing direct string");

foreach (var s in str)
{
    Test(s);
}

Console.WriteLine("============================");
Console.WriteLine("Testing class yamlmember");

foreach (var s in str)
{
    Test1(s);
}

void Test(string value)
{
    var serializer = new SerializerBuilder().WithDefaultScalarStyle(YamlDotNet.Core.ScalarStyle.DoubleQuoted).Build();
    var deserializer = new DeserializerBuilder().Build();
    var serialized = serializer.Serialize(value);
    var deserialized = deserializer.Deserialize<string>(serialized);
    Console.WriteLine("------");
    Console.WriteLine("Testing:");
    Console.Write(value);
    Console.WriteLine("---Serialized:");
    Console.WriteLine(serialized);
    Console.WriteLine("Deserialized:");
    Console.Write(deserialized);
    Console.WriteLine("---Matches:");
    Console.WriteLine(deserialized == value);
}

void Test1(string value)
{
    var tc = new TestClass {  X = value };
    var serializer = new SerializerBuilder().Build();
    var deserializer = new DeserializerBuilder().Build();
    var serialized = serializer.Serialize(tc);
    var deserialized = deserializer.Deserialize<TestClass>(serialized);
    Console.WriteLine("------");
    Console.WriteLine("Testing:");
    Console.Write(value);
    Console.WriteLine("---Serialized:");
    Console.WriteLine(serialized);
    Console.WriteLine("Deserialized:");
    Console.Write(deserialized);
    Console.WriteLine("---Matches:");
    Console.WriteLine(deserialized.X == value);
}

class TestClass
{
    [YamlMember(ScalarStyle = YamlDotNet.Core.ScalarStyle.DoubleQuoted)]
    public string X { get; set; } = string.Empty;
}
EdwardCooke commented 1 year ago

Did that answer your question?

mishun commented 1 year ago

Sorry, got distracted. Indeed, it seems to work with DoubleQuoted, thank you!

EdwardCooke commented 1 year ago

Fantastic. I’m going to close this issue then