Closed geert56 closed 1 year ago
Hm, looks like a bug in this demo application where indents are checked. The Yaml renderer is fine, as you can verify with this example:
items:
- { part_no: A4786,
descrip: Water Bucket (Filled),
price: 1.47,
quantity: 4 }
Also this works:
items:
-
part_no: A4786
descrip: Water Bucket (Filled)
price: 1.47
quantity: 4
Correct, some variations definitely work. Overall the yaml.l lexer/parser is rather shaky as compared to js-yaml (node.js) or yamllint. As I already mentioned in an other issue, YAML is tough. I am happy to discuss details by email. Thanks for looking into this; I'll close the issue.
Closed.
Hmmm, it's not the lexer, but the logic behind the yaml rules that is the problem. These rules are complex, as you've also said. The problem can be fixed by recognizing the indent position after the -
before the map key, so that subsequent map keys are grouped together. I thought it would be nice demo for reflex capable of handling indentation. Getting the yaml rules implemented is a bit of a pain and results in convoluted code.
Might I suggest to shy away from full YAML and maybe use StrictYAML as the example?
A great implementation of YAML is libfyaml. It passes the whole testsuite and comes with some interesting tools.
Thanks for the suggestions. I had in fact tested yaml.l with a number of realistic yaml examples, but perhaps the examples I had used were all strict(er) yaml or were sanitized to pass most yaml parsers. I don't recall where I found these. I do recall spending way more time on this example than I had anticipated (a few days, instead of a few hours at most that I usually need to get the job done + testing). I worked extensively with XML and JSON as well as CORBA and other (more ancient) exchange formats. Compared to those, yaml is terrible IMO. Sure, yaml is "human readable". I get that, but otherwise what's the point of it? And why make the syntax so lenient? Stricter rules help, not hamper.
FYI. Here is a set of yaml tests that I ran as unit tests and to test the features when implementing yaml.l. There are tests for all (or almost all) various syntactic structures, except the case you've reported here. Back in 2020 I didn't find a suitable set of yaml test cases online to tests against.
Fixed the problem. This fix passes all YAML wikipedia examples too.
Patch yaml.l:618 to insert:
size_t level = 0;
yaml.l:640 insert:
if (token == ';' || token == '=')
next();
if (token == '>')
{
next();
++level;
}
yaml:671 insert:
if (token == '&' || token == '*')
{
data.ref = string;
next();
}
yaml.l:688 insert:
if (token == '<')
{
next();
while (token == '<' && level)
{
next();
--level;
}
}
@geert56 With the option SHOW_TOKENS set, I clearly see erroneous indentation tokens and hence the final echoed yaml is incorrect.
FYI. The indentation tokens are not erroneous! There are additional indention positions inserted with matcher().insert_stop(matcher().columno())
by the parser. These indentation stops indicate the start of a value at which subsequent yaml data on lines below may align, so these will not produce new indents but rather align as expected.
It is all pretty clear in the yaml.l parser logic.
I ran the yaml lexer/parser produced by
make yaml
on some example files derived from the YAML wikipedia page (https://en.wikipedia.org/wiki/YAML) and noticed some strange behavior. With the optionSHOW_TOKENS
set, I clearly see erroneous indentation tokens and hence the final echoed yaml is incorrect.Here is a very small example:
which produces: