Open SiriusStarr opened 4 years ago
As far as I can tell, the only way to fix this is to make two changes:
Remove support for customizable import parsing (e.g. the Dhall.Parser.exprA
utility)
... which is a prelude to:
Switch to an LALR parser (e.g. Earley
) or parser generator (e.g. happy
)
... which would fix most of the parsing issues that we've been running into
I don't think there are any shortcuts left at this point for fixing this particular issue
This is a particularly aggravating issue when you have a long file, example I had this typo:
on line 1238, and it caused this error message:
> dhall --file ./log2.dhall
dhall:
Error: Invalid input
./log2.dhall:1217:11:
|
1217 | [ tweag
| ^
unexpected '['
expecting ',', ->, :, ], operator, or whitespace
That’s two screen heights above the actual error at the beginning of the nested list!
For the record: https://github.com/dhall-lang/dhall-haskell/pull/1740 was the first step to implement the plan in https://github.com/dhall-lang/dhall-haskell/issues/1592#issuecomment-562965613.
I recently had a run-in with this issue. I think I would have torn my hair out, if it was long enough to do so! ;)
@Gabriel439 I see that you've mentioned generating the parser with happy
above. I've done small edits on GHC's happy
code, but I don't have a good understanding of it. Would we generate the parser from dhall.abnf
, or a modified version of it?
Would we generate the parser from
dhall.abnf
, or a modified version of it?
Happy doesn't directly work with that specific format so we need to port it to happy syntax. The only problem that we might encounter by porting dhall.abnf to happy syntax is that happy works much better on a right-recursive grammar and dhall.abnf spec is mostly left-recursive (indirectly).
@sjakobi: Here's an example of what an alex
lexer and happy
parser might look like, from an older project of mine:
However, I later switched to Earley
for the reasons outlined in this commit description:
https://github.com/Gabriel439/Haskell-Morte-Library/commit/f6820953765e5d518469fbd47834215394626068
@Gabriel439 Thanks! With Earley
we'd still have a hand-written parser, right? It can't be generated from dhall.abnf
as some other implementations seem to do?
@sjakobi: Yeah, it would still be hand-written, but it would no longer require backtracking (which is the main reason for poor error messages) or tricks to avoid pathological performance. The main downside of Earley
is that it might have slower constant factors because it doesn't support fast bulk operations like megaparsec
does
Morte … I just now realized that the Planescape naming scheme is older than Dhall :)
@Profpatsch: Yep! That's correct :slightly_smiling_face:
@ocharles just shared his Earley
-based parser in https://hub.darcs.net/ocharles/dhalli/browse/lib/Dhalli/Syntax.hs with me. Might be a good resource when we finally make the switch! :)
BTW, Happy let's you export multiple parsers these days, and it can report expected tokens (output may need some tidying, but it is doable).
By chance I've come across headed-megaparsec
, which seems to have the purpose of providing better error messages (see https://hackage.haskell.org/package/headed-megaparsec-0.2.0.2/docs/HeadedMegaparsec.html#t:HeadedParsec).
I'm wondering whether it might be useful for resolving this issue without abandoning the current parser code entirely. Currently I don't properly understand what it does and how it works though.
It seems the parser is backtracking too much (I’m bitten by this a lot when writing any “serious” dhall expressions).
Can the earley parser handle that correctly? If so, what’s hindering us from using @ocharles version?
It's probably possible to switch to an Earley
parser. It just requires a lot of work. We can't use @ocharles's version exactly as written because there are too many subtle differences, but we could consult it for writing the equivalent Dhall parser.
If a parse error occurs within parentheses in function application, instead of an error at the point of the parse error, you instead get
unexpected '('
at the open parenthesis. For example:vs
This does not itself seem particularly problematic, until the expression within the parentheses is complex and the parse error difficult to see at a glance. For example:
You can stare at the above for quite a long time before you notice the real problem, namely that
\l
is not a valid escape sequence, and it should be"\\latexcommand{${text}}"
. (Don't ask me how I know this...)