durkiewicz / elm-plugin

Elm language support plugin for IntelliJ IDEA.
MIT License
136 stars 15 forks source link

INDENTATION and SEPARATION_BY_INDENTATION #76

Closed eirikm closed 7 years ago

eirikm commented 7 years ago

I have been studying the bnf and flex file but can't understand where INDENTATION and SEPARATION_BY_INDENTATION is defined.

To me it looks like INDENTATION isn't used, but what about SEPARATION_BY_INDENTATION?

I guess it is my understanding of grammar-kit and jflex that is lacking, but I would really like to be enlightened :-)

durkiewicz commented 7 years ago

I can help you with that.

Regarding INDENTATION: it indeed might just be unnecessary and left in the code by mistake.

Regarding SEPARATION_BY_INDENTATION: BNF can only express context-free grammar. Elm is not only non-context-free but it also has quite a weird approach to separating definitions inside let..in and case..of statements. Let's have a look at an example:

f x =
 let                      a =
 x + 1
                          b =
                                                  x + 2
                          c =
              x
                                       +
                     3
                                           in a + b + c

It is perfectly fine Elm code as beginnings of b and c have the same distance to the left edge as a beginning of a. The tokens just before b and c are called SEPARATION_BY_INDENTATION in the plugin. To achieve such parsing you need to write your own parsing code compatible with Grammar-Kit's auto-generated code. It is defined in org.elmlang.intellijplugin.manualParsing package. You use external keyword in bnf file to reference such code (written by a human).

Regarding the place, where SEPARATION_BY_INDENTATION is defined: whenever you type any upper-case string in the bnf file, the Grammar-Kit adds it to ElmTypes.

durkiewicz commented 7 years ago

I hope it answers your questions. If you need anything more, feel free to ask.

eirikm commented 7 years ago

Thanks a lot!

What tricked me was:

value_declaration ::= value_declaration_left EQ expression SEPARATION_BY_INDENTATION* { methods = [getReferencesStream getDisplayName getRole] } this is the only place SEPARATION_BY_INDENTATION is used in a non-external context.

I figure this works because it always matches SEPARATION_BY_INDENTATION zero times, right?

Thanks in advance, Eirik

durkiewicz commented 7 years ago

It sometimes might be matched more than zero times, e.g.:

x =
  case y of
    A -> 0
    B -> 1

Mind the last line with leading 4 spaces (equal to the indentation of beginnings of case's branches). It potentially might indicate a 3rd branch and therefore needs to be tokenized into a SEPARATION_BY_INDENTATION, not just a WHITE_SPACE or something.

klazuka commented 7 years ago

@durkiewicz when parsing top-level value declarations, how does the SEPARATION_BY_INDENTATION token appear in the token stream? As far as I can tell, that token is only generated by the IndentationTokenTypeRemapper, which is only enabled while parsing let/in and case/of expressions.

durkiewicz commented 7 years ago

@klazuka If a token has been remapped to SEPARATION_BY_INDENTATION by the IndentationTokenTypeRemapper then it's a SEPARATION_BY_INDENTATION in whatsoever way you check it. There's no such thing as "it's token A at this level of parsing and at the same time token B at that level". Tokens are a concept of a lexer which is one abstraction level below a parser. So if a top-level declaration contains, let's say, a case..of expression, then you'll have SEPARATION_BY_INDENTATION tokens in your stream.

klazuka commented 7 years ago

I should have used an example. Imagine that you have the following value declarations at the top-level of a module:

f = g x y
h = 42

How do you know where the body of f ends and where the new value declaration, h, begins? Your BNF says that value declarations are delimited by SEPARATION_BY_INDENTATION (value_declaration ::= value_declaration_left EQ expression SEPARATION_BY_INDENTATION*). But I don't see how the SEPARATION_BY_INDENTATION tokens can occur in this context since they are—as far as I can tell—only emitted (via token remapping) while parsing let/in and case/of, neither of which are involved in the example above.