biojppm / rapidyaml

Rapid YAML - a library to parse and emit YAML, and do it fast.
MIT License
576 stars 99 forks source link

multiline assignment will make the line number is not correctly #260

Closed CairoLee closed 2 years ago

CairoLee commented 2 years ago

Description

When yaml include | mark for multiline assignment, the line number is not correctly

Example Code

ryml::Parser parser;

ryml::Tree tree = parser.parse_in_arena("source.yml", R"(Body:
  - Id: 1
    Name: Apple
    Script: |
      Line One
      Line Two
  - Id: 2
    Name: Cat
    Script: |
      Line One
      Line Two
  - Id: 3
    Name: Dog
    Script: |
      Line One
      Line Two)");

ryml::Location loc = parser.location(tree.rootref());
loc = parser.location(tree["Body"][2]["Name"]);

sprintf("loc.line = %d\n", loc.line + 1);

Expected results

loc.line = 13

Current results

loc.line = 15

CairoLee commented 2 years ago

Bump~ the project rathena already using rapidyaml to parse the yaml file. in the project, a lot of data using | to write the item script, incorrectly line numbers can make debugging data errors very difficult

biojppm commented 2 years ago

@CairoLee I finally had some time to look at the problem. I'm sorry to say the situation is not good.

This is happening because all multiline scalars (ie all of plain, single-quoted, double-quoted, block-literal or block-folded) have their newline characters filtered during the parsing phase. As currently written, the location data structure is only filled after parsing, which means that by the time this happens, the original newline characters are no longer there, so those lines fail to count for the location. So the reported line is shifted down in some cases and up in other cases, on every multiline scalar. Which means that the shift will compound as the file progresses.

Fixing this will be hard. It will consist of ensuring that after being filtered, each scalar should have the same number of newlines in its original source region. So if newlines were removed, they must be added after the end of the filtered scalar, but before the next YAML token. For all the multiline scalar types.

Expect more time before this gets fixed.

biojppm commented 2 years ago

@CairoLee This is now fixed in master. Beware that the fix changes are breaking; you will need to change your code to instantiate the parser with options setting it to track locations. For an example, see the notes added to the changelog or the quickstart file in #307

CairoLee commented 2 years ago

@biojppm Thank you for following up on this issue, I noticed what needs to be adjusted.

I will use the latest code and test it later and give you feedback, thanks again.