fjoppe / Legivel

F# Yaml 1.2 parser
https://fjoppe.github.io/Legivel
The Unlicense
60 stars 5 forks source link

Unable to deserialize sequence with compact indentation #23

Closed queil closed 4 years ago

queil commented 4 years ago

Description

If sequence elements are not indented. I.e. the hyphens have the same indentation as the parent element the following error occurs:

Incorrect Syntax, this content cannot be related to previous document structure.

Repro steps

Trying to deserialize:

- behavior: merge
  envs:
  - base.env
  name: base-name

into:

type ConfigMapGenerator = {
    behavior: string
    envs: string list
    name: string
  }
Deserialize<ConfigMapGenerator list>

Expected behavior

It should correctly deserialize the yaml as ConfigMapGenerator.

Actual behavior

Getting the following deserialization result:

{ Warn = []
  Error =
         [{ Location = (l4, c1)
            Message =
                     "Incorrect Syntax, this content cannot be related to previous document structure." }]
  StopLocation = (l4, c1)}

Known workarounds

Looks like increasing the indent on the envs elements helps.

- behavior: merge
  envs:
    - base.env
  name: base-name

Related information

queil commented 4 years ago

Spec: https://yaml.org/spec/1.2/spec.html#id2799784

fjoppe commented 4 years ago

Thanks, I will look into this.

fjoppe commented 4 years ago

The Yaml spec you refer to says:

Since people perceive the “-” indicator as indentation, nested block sequences may be indented by one less space to compensate, except, of course, if nested inside another block sequence (block-out context vs. block-in context).

- behavior: merge     # 1
  envs:               # 2
  - base.env          # 3
  name: base-name     # 4

The yaml in line 3 is nested under line 1, so it requires the extra space.

I'm not precisely sure why this design choice was made. Indentation is the standard in yaml, so it is actually a special case that you don't require the extra space at the top-level.

If I look at example 8.22, it does help in aesthetics I think.

Technically the parser needs to detect at line 2 whether the value of the initiated mapping is either empty, followed by the next mapping node, or something else. But the parser has this task also at the top-level. Ie. I don't see the argument that works on both sides, so I don't know what the desinger's considerations have been for this choice. I think aesthetics as yaml is about readibility.

I also noted that the reference parser does allow this input. However these tools often do not implement the spec correctly.

queil commented 4 years ago

@fjoppe The spec is ambiguous in the way that it does not define what nested actually means. Is a nested element a child or a descendant? If it means descendant then indeed such a syntax would not be valid. However, meaning a child it would allow my example while disallowing (rightfully) reduced indentation in constructs like the below:

-
  - A
  - B
-
  - C
  - D
fjoppe commented 4 years ago

I will probably reconsider my position.

After some testing, I discovered that it is not line 3 giving the parse error, it is line 4:

- behavior: merge     # 1
  envs:               # 2
  - base.env          # 3
  name: base-name     # 4

At some point on line 4, the parser expects the next entry in the sequence (from line 3), which fails, but then the parser does not return properly in expecting a mapping.

Where your input does not parse:

- behavior: merge
  envs:
  - base.env
  name: base-name

The following input does parse:

- behavior: merge
  envs:
  - base.env
  - name: base-name

Note the last line.

So for the below, when the parser tries to return to mapping mode on line 4, the stream position is at "n" from "name" and then expects two spaces. This smells like a bug.

- behavior: merge     # 1
  envs:               # 2
  - base.env          # 3
  name: base-name     # 4
fjoppe commented 4 years ago

Fixed in this version: https://www.nuget.org/packages/Legivel/0.4.2