goccmack / gocc

Parser / Scanner Generator
Other
622 stars 48 forks source link

EBNF would be great #21

Closed willfaught closed 8 years ago

willfaught commented 8 years ago

Any plans for EBNF support?

willfaught commented 8 years ago

Whoops, I missed it, my bad.

mewmew commented 8 years ago

Hi Will,

No worries, and welcome to Gocc :)

Including an answer to this question from Marius (https://github.com/goccmack/gocc/issues/13#issuecomment-181827331):

gocc3 included two major changes:

  1. It supported an EBNF for input and
  2. It supported David Pager's "Practical General Method" for generating LR(1) parsers as an option to the Knuth LR(1) algorithm.

I am of the opinion that EBNF encourages grammars which are outside LR(1). Most EBNF grammars I wrote had lots of LR(1) conflicts. I would advise against adding EBNF to gocc again.

When I have the time again I would like to add Pager's PGM again because it generates smaller tables for full LR(1) grammars -- similar in size to LALR for the same grammar. I also think that the original paper is beautiful and that it is a pity that LALR became popular instead of this technique. The PGM parse table generator was working in gocc3 but not extensively tested.

I recalled gocc3 because of errors in the translation of EBNF to BNF for parser generation, which I didn't consider worth fixing because I peferred to revert to BNF input.

Cheers /u

willfaught commented 8 years ago

@mewmew Thanks!


In the current version, I see lexer support for EBNF grouping (...), options [...], and repetition {...}, but no exception/subtraction/difference a - b (everything in a except that in b). Is there a way to do that with gocc? Apparently there's no way to do it just in terms of regular BNF like you can with those others listed above...?

I'm experimenting with the Haskell syntax, which uses exceptions.

awalterschulze commented 8 years ago

I don't think there is a way to do difference. If I am correct, it is basically equivalent to a & !b, but gocc does not have AND.

willfaught commented 8 years ago

Putting it in terms of & makes total sense, since CFLs aren't closed under intersection. Thanks.

awalterschulze commented 8 years ago

Yes CFLs are not closed under intersection, but regular languages are. I might be wrong, but I think the lexer part of the BNF is using a regular language?

willfaught commented 8 years ago

@awalterschulze I think being able to do something like _a: _b _c _b makes it not regular, right? Productions in regular grammars are left/right linear, as in A -> xB or A -> Bx.

willfaught commented 8 years ago

Looks like I was confused but partly right initially. There is sort-of-EBNF for lexers, but not for parsers. It would be convenient for parser grammars to at least have repetitions ({...}), options ([...]), and groupings ((...)), which safely close over CFLs, to keep things simple and clean. An at-least-one (+) repetition operator would be awesome too. Maybe <...>... I understand the concern about lookahead tendencies, though, so I guess that's a tradeoff.

awalterschulze commented 8 years ago

Yes trying to match brackets makes it "irregular". So I am wonder if the ! in the lexer makes it irregular, since the ! is not a pure implementation, but rather an implementation that is user friendly.

I think @goccmack tried to add an EBNF syntax and that would have included {} and [] etc, but he said that this makes it hard to write LR(1) grammars and the user tends to cause more conflicts. Also in my opinion this makes it harder to write SDT rules.