BNFC / bnfc

BNF Converter
http://bnfc.digitalgrammars.com/
583 stars 162 forks source link

Keep track of parens in layout resolver #370

Closed andreasabel closed 3 years ago

andreasabel commented 3 years ago

Robert Krook observed that the layout handling does not mesh well with e.g. grouping things with parentheses.

ETimes.   Exp   ::= Exp "*" Exp1;
ESum.     Exp1  ::= "sum" "{" [Exp] "}";
EInt.     Exp1  ::= Integer;

layout "sum";

separator Exp ";";
coercions Exp 1;

comment "--";
comment "{-" "-}";

Example text:

(sum
  1
  2) * 3

The layout inference reconstructs this as

(sum {
  1 ;
  2) * 3 }

as it ignore the grouping by (...).

It seems that the current layout mechanism is too crude to work in such situations. E.g. the Agda parser also closes layout blocks when encountering a parse error, which would be raised at ) in the example. https://github.com/agda/agda/blob/044843c5281a7bdb9479317793a75c8c0fcfadd9/src/full/Agda/Syntax/Parser/Parser.y#L356-L365

So maybe the layouter should rather emit special tokens like INDENT NEWLINE and DEDENT which would then have to be handled by the grammar. Not sure this would work without semantic actions to manage pending layout blocks (like popBlock in the Agda parser).

andreasabel commented 3 years ago

In some situations, layout stop can be used to close blocks, like in:

EInt. Exp ::= Integer;
ELet. Exp ::= "let" "{" [Exp] "}" "in" Exp;

separator Exp ";";

layout "let";
layout stop "in";

Example text:

let 1
    2 in 3

reconstructed as

let { 1 ; 2 } in 3

But we cannot make closing parentheses layout-stop words, as it depends on the context when they should close blocks. E.g. not in:

sum
  (1)
  2

The layout resolver could also keep track of bracketing constructs, but there could be many depending on the grammar, and the bracketing is actually the task of the parser.

A hack would be to just consider the round brackets "(...)" since these are special (cf. coercions). Or all kinds of bracket pairs that occur in the grammar. And let the user declare extra bracketing constructs like begin...end. But these are all just clutches.

Rewbert commented 3 years ago

For my specific use case, I solved it by modifying the layout resolver to remember where it opens parentheses. If it sees a closing parentheses that match up with an open parentheses encountered before the current layout-block was opened, it will close the layout-block.

andreasabel commented 3 years ago

From email conversation:

Concerning the layout, there is the general question which bracketing should have priority over the bracketing inferred by the layout resolver.

I think we are trained to recognized matching parentheses (to some small nesting depth), so we expect this from a parser as well. [Caveat: We only match parentheses that are "not too far" away, and give other structuring elements, like bullet points, paragraphs, empty lines, headings etc. priority over matching parens.]

We would reject a design that uses certain parentheses for other purposes than grouping expressions. However, other symbols we accept both as parentheses and in other function, like "<" and ">". So if the layout resolver matches ">" with "<", this could be undesirable, because in some situations they are not supposed to match, in others they are (see e.g. the C++ language). This is why I think the problem can be in its generality only be solved in the parser, not by some heuristics in the layout resolver.

If one is happy with a restricted solution, one could let the user declare bracketing constructs, like in a pragma

layout bracketing "<" ">";
layout bracketing "begin" "end";
layout bracketing "foo" "bar";

and these would be tracked, unconditionally, in the layout resolver, just like explicit "{" and "}" are tracked at the moment.

Related: #355

andreasabel commented 3 years ago

A quick fix of the generated Layout.hs for the OP is the following diff, which tracks opening and closing parentheses on the layout stack.

@@ -41,6 +41,14 @@
 layoutClose = List.nub $ mapMaybe (delimClose . snd) layoutWords
 layoutSep   = List.nub $ map (delimSep . snd) layoutWords

+-- extra parentheses to keep track of
+
+parenOpen :: [TokSymbol]
+parenOpen = [ TokSymbol "(" 1 ]
+
+parenClose :: [TokSymbol]
+parenClose = [ TokSymbol ")" 2 ]
+
 -- | Replace layout syntax with explicit layout tokens.
 resolveLayout :: Bool    -- ^ Whether to use top-level layout.
               -> [Token] -> [Token]
@@ -60,11 +68,11 @@
     -- put an explicit layout block on the stack.
     -- This is done even if there was no layout word,
     -- to keep opening and closing braces.
-    | isLayoutOpen t0
+    | isLayoutOpen t0 || isParenOpen t0
       = t0 : res (Just t0) (Explicit : st) ts

     -- If we encounter a closing brace, exit the first explicit layout block.
-    | isLayoutClose t0
+    | isLayoutClose t0 || isParenClose t0
       , let (imps, rest) = span isImplicit st
       , let st' = drop 1 rest
       = if null st'
@@ -291,3 +299,11 @@
 -- | Check if a token is the layout close token.
 isLayoutClose :: Token -> Bool
 isLayoutClose = isTokenIn layoutClose
+
+-- | Check if a token is an opening parenthesis.
+isParenOpen :: Token -> Bool
+isParenOpen = isTokenIn parenOpen
+
+-- | Check if a token is a closing parenthesis.
+isParenClose :: Token -> Bool
+isParenClose = isTokenIn parenClose
andreasabel commented 3 years ago

For now, we simply assume that ()[]{} are the parentheses in the user grammar to keep track of. A more systematic solution can be delivered when demanded...