BNFC / bnfc

BNF Converter
http://bnfc.digitalgrammars.com/
587 stars 165 forks source link

Haskell: errors as record types #423

Open vkpgwt opened 2 years ago

vkpgwt commented 2 years ago

Hello!

BNFC errors in the Haskell backend are plain String values like this: syntax error at line 3, column 1 before `token' . There are some inconveniences here:

The new error type might be like:

data BNFCError = BNFCError
  { bnfcLineNo :: Int
  , bnfcColumnNo :: Int
  , bnfcBadInput :: String
  , bnfcExpectedTokenNames :: [String]
  , bnfcAcceptedTokenName :: String
  }

In fact, it might be more complex, since BNFC can also generate lexer errors, and we might want to support the Text type.

Since this is a breaking change, a new command line option might be introduced, like --error-type=(string|record), defaulting to the String errors.

Would you agree with this change? I would appreciate if you give some advice. I'm using BNFC in production and we need good error messages. It will be perfect if I create a pull request that will be accepted finally, otherwise I'm going to patch BNFC locally.

andreasabel commented 2 years ago

Hello Anton @vkpgwt, thanks for your suggestion and your interest in BNFC! I think it is a very good proposal.

I suppose this is for the Haskell backend only? (The other backends also do not have structured error reporting.) You might be aware that there is a common testsuite for all backends, to ensure that the generated parsers and printers behave uniformly. (However, this does not stop individual backends to have special features.)

To keep code duplication minimal, I suggest to refactor the Haskell backend to always produce a structured error in the parser, an only convert to string errors in the top-level parser functions (unless --errors=structured is given).

I suggest --errors=string|text|structured as option name to make it less Haskell-specific (and open the avenue to have this in the other backends as well).

dpwiz commented 2 years ago

--errors=string|text|structured

I'm confused. Having string|text selector in multiple places (--text-token, maybe there are others) would give spurious combinations like Text tokens / String errors, String tokens / Text errors.

And if the token name type is driven by the existing option, this option would look like --errors=plain|structured which is basically a --structured-errors flag.