Closed eirikm closed 7 years ago
I can help you with that.
Regarding INDENTATION
:
it indeed might just be unnecessary and left in the code by mistake.
Regarding SEPARATION_BY_INDENTATION
:
BNF can only express context-free grammar. Elm is not only non-context-free but it also has quite a weird approach to separating definitions inside let..in
and case..of
statements. Let's have a look at an example:
f x =
let a =
x + 1
b =
x + 2
c =
x
+
3
in a + b + c
It is perfectly fine Elm code as beginnings of b
and c
have the same distance to the left edge as a beginning of a
. The tokens just before b
and c
are called SEPARATION_BY_INDENTATION
in the plugin.
To achieve such parsing you need to write your own parsing code compatible with Grammar-Kit's auto-generated code. It is defined in org.elmlang.intellijplugin.manualParsing
package.
You use external
keyword in bnf file to reference such code (written by a human).
Regarding the place, where SEPARATION_BY_INDENTATION
is defined: whenever you type any upper-case string in the bnf file, the Grammar-Kit adds it to ElmTypes
.
I hope it answers your questions. If you need anything more, feel free to ask.
Thanks a lot!
What tricked me was:
value_declaration ::= value_declaration_left EQ expression SEPARATION_BY_INDENTATION* { methods = [getReferencesStream getDisplayName getRole] }
this is the only place SEPARATION_BY_INDENTATION is used in a non-external context.
I figure this works because it always matches SEPARATION_BY_INDENTATION zero times, right?
Thanks in advance, Eirik
It sometimes might be matched more than zero times, e.g.:
x =
case y of
A -> 0
B -> 1
Mind the last line with leading 4 spaces (equal to the indentation of beginnings of case's branches). It potentially might indicate a 3rd branch and therefore needs to be tokenized into a SEPARATION_BY_INDENTATION, not just a WHITE_SPACE or something.
@durkiewicz when parsing top-level value declarations, how does the SEPARATION_BY_INDENTATION token appear in the token stream? As far as I can tell, that token is only generated by the IndentationTokenTypeRemapper
, which is only enabled while parsing let/in and case/of expressions.
@klazuka If a token has been remapped to SEPARATION_BY_INDENTATION by the IndentationTokenTypeRemapper
then it's a SEPARATION_BY_INDENTATION in whatsoever way you check it. There's no such thing as "it's token A at this level of parsing and at the same time token B at that level".
Tokens are a concept of a lexer which is one abstraction level below a parser.
So if a top-level declaration contains, let's say, a case..of
expression, then you'll have SEPARATION_BY_INDENTATION tokens in your stream.
I should have used an example. Imagine that you have the following value declarations at the top-level of a module:
f = g x y
h = 42
How do you know where the body of f
ends and where the new value declaration, h
, begins? Your BNF says that value declarations are delimited by SEPARATION_BY_INDENTATION
(value_declaration ::= value_declaration_left EQ expression SEPARATION_BY_INDENTATION*
). But I don't see how the SEPARATION_BY_INDENTATION
tokens can occur in this context since they are—as far as I can tell—only emitted (via token remapping) while parsing let/in
and case/of
, neither of which are involved in the example above.
I have been studying the bnf and flex file but can't understand where INDENTATION and SEPARATION_BY_INDENTATION is defined.
To me it looks like INDENTATION isn't used, but what about SEPARATION_BY_INDENTATION?
I guess it is my understanding of grammar-kit and jflex that is lacking, but I would really like to be enlightened :-)