eemeli / yaml

YAML parser and stringifier for JavaScript
https://eemeli.org/yaml
ISC License
1.32k stars 116 forks source link

Block Scalar indentation shenanigans #553

Open RedCMD opened 5 months ago

RedCMD commented 5 months ago

Describe the bug I would think this is quite some oversight in the YAML spec what are your thoughts on it?

To Reproduce

abc:
| #this should be an error
 block scalar header requires +1 indentation
---
def: |
this should not be an error
block scalar content does not require +1 indentation
---
ghi: |
    the first trailing comment is not allowed to use tabs as indentation
    #this should be an error

Expected behaviour

Versions:

Additional context image

ingydotnet commented 5 months ago

Your example is not at the root level. You've already started a mapping which puts the level from -1 to 0. Here's an example that is valid (and I think problematic and hopefully addressed in a future version):

--- |
foo
bar

That scalar is present when there is no indentation yet, because no mapping or sequence is yet started.

I think it was a mistake to have defined no indentation as indentation of -1.

imho, non-key, scalar content should not be allowed in column 1 as lines like --- and ... become ambiguous.

perlpunk commented 5 months ago

regarding

---
def: |
this should not be an error
block scalar content does not require +1 indentation

This looks like an oversight in the 1.2.1 -> 1.2.2 spec. Compare https://yaml.org/spec/1.2.1/#c-l+literal(n) https://yaml.org/spec/1.2.1/#l+block-sequence(n)

[170]   c-l+literal(n)  ::= “|” c-b-block-header(m,t)
l-literal-content(n+m,t)

[183]   l+block-sequence(n) ::= ( s-indent(n+m) c-l-block-seq-entry(n+m) )+
/* For some fixed auto-detected m > 0 */

and https://yaml.org/spec/1.2.2/#rule-c-l+literal https://yaml.org/spec/1.2.2/#rule-l+block-sequence

[170] c-l+literal(n) ::=
  c-literal                # '|'
  c-b-block-header(t)
  l-literal-content(n+m,t)

[183] l+block-sequence(n) ::=
  (
    s-indent(n+1+m)
    c-l-block-seq-entry(n+1+m)
  )+

As you can see, for rule 183 the indentation parameter was changed from n+m to n+1+m, while for rule 170 it stayed the same. Probably because of the missing /* For some fixed auto-detected m > 0 */. The definition of m is pretty unclear in several versions of the spec.

RedCMD commented 5 months ago

This looks like an oversight in the 1.2.1 -> 1.2.2 spec.

that confirms my suspicion

def: |
making this
invalid
---
|
and this is 0 indent
instead of -1
eemeli commented 5 months ago

I've added a fix making the first variant properly an error, i.e.

abc:
| #this should be an error
 block scalar header requires +1 indentation

The second variant I'm going to consider a spec bug, and not fix the current implementation. I think the current implementation follows the intent of the spec, even if the specific rule (probably s-l+block-scalar?) doesn't add the +1 that it should.

The third case I'm pretty sure is valid as is, as leading tabs are fine in l-comment. Edit: Actually, not sure at all about the third one.