BNFC / bnfc

BNF Converter
http://bnfc.digitalgrammars.com/
587 stars 165 forks source link

Layout keywords followed by empty block #194

Closed andreasabel closed 5 years ago

andreasabel commented 8 years ago

Consider the grammar file Tree.cf:

layout "branches" ;

Node.  Tree ::= Integer "branches" "{" [Tree] "}" ;
separator Tree ";" ;

and a sample file tree.txt:

1 branches
  10 branches
  11 branches
    111 branches
    112 branches
  12 branches

I would have expected AST

Node 1 [ Node 10 [], Node 11 [ Node 111 [], Node 112 []], Node 12 []]

printed e.g. as

1 branches {
  10 branches {};
  11 branches {
    111 branches;
    112 branches };
  12 branches {}

but the bnfc-generated parse produces

Node 1 [Node 10 [Node 11 [Node 111 [Node 112 [Node 12 []]]]]]

printed as

1 branches {
  10 branches {
    11 branches {
      111 branches {
        112 branches {
          12 branches {
            }
          }
        }
      }
    }
  }

I think something is wrong with the handling of empty blocks following a layout keyword.

andreasabel commented 8 years ago

Any connection to #77?

andreasabel commented 8 years ago

Looking at the docs, it seems that bnfc does implement its specifcation:

But, if the token t following of is not an opening curly bracket, a bracket is inserted, and the start column of t is remembered as the position at which the elements of the layout list must begin. Semicolons are inserted at those positions. When a token is eventually encountered left of the position of t (or an end-of-file), a closing bracket is inserted at that point.

However, it seems that specification is not matching the intuition. I some how miss that the token following the layout token should be indented more to give sensible behavior (and avoid this issue).

gdetrez commented 8 years ago

I believe the layout mechanism was originally added to implement haskell-ish layout syntax, where the following things are acceptable:

    do
    putStrLn "Hello World"

and even

    do
  putStrLn "Hello World"

(I'm not saying one should do that, but it is valid haskell...)

So, even if I agree that it is kind of counter-intuitive, it seems that it is the intended behavior.

andreasabel commented 8 years ago

But Haskell rejects

{-# LANGUAGE NondecreasingIndentation #-}

test = do
    do
  putStrLn "Hello, World!"

whereas bnfc's layout mechanism accepts it. Considering the grammar

layout "do";

Decl. Decl ::= Ident "=" Exp;

Var.  Exp ::= Ident;
Do.   Exp ::= "do" "{" [Exp] "}" ;

separator Exp ";" ;

and the test file

test = do
    do
  test

the generated parser responds:

Parse Successful!

[Abstract Syntax]

Decl (Ident "test") (Do [Do [Var (Ident "test")]])

[Linearized tree]

test = do {
  do {
    test 
  }
  }

Maybe instead of indented more I should have said not indented less which corresponds to Haskell's NondecreasingIndentation.

andreasabel commented 5 years ago

I am fixing this towards a new layout block needs to be indented strictly more than its enclosing layout block. This means that next token after the layout keyword only determines the new indentation column if it is strictly more indented than the previous indentation column. Otherwise, the next token will close immediately the new layout block (and continue in the previous block (or even close more blocks)).

Given this grammar,

Modl. Module ::= "module" Ident "where" "{" [Module] "}" ;
separator Module ";" ;
layout "where" ;

the example

module Top where
  module A where
  module B where
    module B1 where
      module B1A where
    module B2 where
  module C where
    module C1 where

parses as intended as

module Top where {
  module A where {
    } ;
  module B where {
    module B1 where {
      module B1A where {
        }
      } ;
    module B2 where {
      }
    } ;
  module C where {
    module C1 where {
      }
    }
  }