BNFC / bnfc

BNF Converter
http://bnfc.digitalgrammars.com/
588 stars 164 forks source link

Juxtaposing quotation mark #436

Open sergey-goncharov opened 1 year ago

sergey-goncharov commented 1 year ago

This grammar Exp. Exp ::= Ident "'" ; generates a parser that correctly parses x ', but not x' for some reason. I've played with other characters, including other quotation symbols, such as and then both x’ and x ’ are recognized, which is imho the correct behaviour. Any particular reasons, why ' is treated specially (and how to avoid it?)? Thanks!

andreasabel commented 1 year ago

Since Idents can contain quotation marks, x' will parse as an identifier.
You can define your own identifier tokens, e.g.:

Exp. Exp ::= Id "'";
token Id letter (letter | digit | '_')*;

This works as expected in the "imperative" backends (C, C++, Java) but not in the "functional" ones (Haskell, OCaml).

The problem is in how the functional backends implement the lexer: they always include lexing of Ident, so that keywords can be lexed as Ident and then classified as keywords later. This is to prevent explosion of the lexer automata.
Unfortunately, it leads to this bug.

Related:

sergey-goncharov commented 1 year ago

Thanks, Andreas, for a quick reply!

Exp. Exp ::= Id "'";
token Id letter (letter | digit | '_')*;

That is roughly how I started. I've just minimized that example, since the problem persisted. I am following the basics steps from the tutorial, which suggests to use Test, generated with bnfc -d -m for basic testing.

andreasabel commented 1 year ago

Yes, unfortunately "'" isn't a proper operator character atm in the standard backend (Haskell). You can use this invocation instead, to use the C backend instead of the Haskell one.

bnfc --c -m GRAMMAR.cf
jasper-e commented 1 year ago

Yes, unfortunately "'" isn't a proper operator character atm in the standard backend (Haskell). You can use this invocation instead, to use the C backend instead of the Haskell one.

bnfc --c -m GRAMMAR.cf

My problem is that I combine bnfc Haskell backend with Haskell-based uuagc... I would highly appreciate a command line flag for the Haskell backend to omit including Ident, which I could put in my Makefile...