Ygg01 / Linguini

C# Port of fluent.rs zero-copy parser
Apache License 2.0
28 stars 8 forks source link

Attributes are not parsed if they begin with a tab character #53

Closed eXplowar closed 10 months ago

eXplowar commented 10 months ago

Ffl-file with tabs for attributes:

2024-01-19_224956

Result of parsing (Linguini incorrectly identifies this as Junk):

2024-01-19_224910

Ftl without tabs (no errors, attributes were defined correctly):

2024-01-19_225124

Result of parsing:

2024-01-19_225346

Not sure if this will be correct, but my suggestion is to adjust the condition on line 400: if (_reader.TryPeekChar(out var c) && c.IsAsciiAlphabetic() || c == '\u0009')

2024-01-19_232659

Ygg01 commented 10 months ago

Result of parsing (Linguini incorrectly identifies this as Junk):

Small but important distinction. Linguini correctly identifies this as junk.

Look at https://github.com/projectfluent/fluent/issues/165

Tab was removed as valid indentation of attributes.

Further proof is the Fluent spec: https://github.com/projectfluent/fluent/blob/feffc720fa5d5621a703f42731d709e216fc6730/spec/fluent.ebnf#L32

Attribute           ::= line_end blank? "." Identifier blank_inline? 

/* Whitespace */
blank_inline        ::= "\u0020"+
line_end            ::= "\u000D\u000A"
                      | "\u000A"
                      | EOF
blank               ::= (blank_inline | line_end)+

So an attribute is a rule that says \n then OneOrMore('\n' or ' ') then . then identifier.

You would need to have U+0009 (tab) character to appear somewhere in blank definition. In fact, it doesn't.

By complying with this request we would be diverging from Fluent spec too much, for little gain.

eXplowar commented 10 months ago

Thanks for the answer, we'll know ;)

Ygg01 commented 10 months ago

Thanks for the answer, we'll know ;)

What you can do is fix files to use spaces or fork Linguini.Syntax for your own usage. But then not sure how that would work with Nuget.