Closed gregsdennis closed 7 months ago
I bet it’s how carriage return/line feeds are normalized that is causing the problem. And because the get normalized the tab character gets counted as white space and the subsequently dropped. Unfortunately I don’t have time at this moment to debug and get a fix in, the parser is pretty complicated. I’ll hunt down where it’s at and post a link though.
Basically it checks for any of those characters and counts it as a new line.
I sse you actually have this test:
[Theory]
[InlineData("|\n b-carriage-return\r lll", "b-carriage-return\nlll")]
public void NewLinesAreParsedAccordingToTheSpecification(string yaml, string expected)
{
AssertSequenceOfEventsFrom(Yaml.ParserForText(yaml),
StreamStart,
DocumentStart(Implicit),
LiteralScalar(expected),
DocumentEnd(Implicit),
StreamEnd);
}
(other test cases removed)
It seems that maybe this is intended behavior? I'll check the spec.
I got this string from another spec that I'm trying to implement. It's possible they're just using bad YAML.
@gregsdennis linked to https://yaml.org/spec/1.2.2/#54-line-break-characters in json-e/json-e#476. That section is about parsing, or perhaps more accurately tokenizing. It says that 0a0d
, 0a
, and 0d
should be normalized to some single newline format when seen i the input -- even when within a scalar such as a multiline string value.
However, that section doesn't address parsing escapes in a string value (I'm sure that's covered elsewhere). And more to the point, it doesn't describe anything after the tokenization is complete. So if by whatever means a YAML input parses to a string containing ASCII characters CR, LF, or a consecutive CR and LF, this section does not at all apply to handling of that string value as the parsing is complete.
I suspect that the error in this bug report is in the input:
var yaml = Parse("\" \f\n\r\t\vabc \f\n\r\t\v\""); // problem is here
the C# parser is interpreting those escapes, so YAML is getting actual FF, CR, LF, TAB, etc. characters. I suspect that should be
var yaml = Parse("\" \\f\\n\\r\\t\\vabc \\f\\n\\r\\t\\v\""); // problem is here
The actual text in question is found in a YAML file and is:
template: {$eval: "rstrip(' \f\n\r\t\vabc \f\n\r\t\v')"}
To my understanding, this is decoded as the actual whitespace chars.
https://yaml-online-parser.appspot.com/ converts this to JSON as:
{
"template": {
"$eval": "rstrip(' \f\n\r\t\u000babc \f\n\r\t\u000b')"
}
}
which clearly preserves the characters.
Updating the test code above with (note the verbatim string @""
so escapes are right):
var json = JsonNode.Parse(@"{
""template"": {
""$eval"": ""rstrip(' \f\n\r\t\u000babc \f\n\r\t\u000b')""
}
}");
Console.WriteLine(JsonSerializer.Serialize(json, new JsonSerializerOptions{Encoder = JavaScriptEncoder.UnsafeRelaxedJsonEscaping}));
the output for this is:
{"template":{"$eval":"rstrip(' \f\n\r\t\u000Babc \f\n\r\t\u000B')"}}
I sorted this out by just not using the parser for reading these strings. Might still be an issue, but I'll close for now. If it comes up again, surely someone will report it again.
Describe the bug When parsing a string value that contains these escapes, the parser simply omits them.
To Reproduce Here's an NUnit test.
Output is:
You can even see in the debugger that the parsed value isn't right:![image](https://github.com/aaubry/YamlDotNet/assets/2676804/97be7a7a-88a7-4dcd-b111-e79f0f8c3a37)