Closed ramonsnir closed 3 years ago
Error reporting is the area needs to be improved. You may try use a lexer e.g. logos to produce token stream, then use pom-based parser to create high level data structure.
Thanks, @J-F-Liu . I have a sophisticated hand-written lexer. My errors are on the parsing level. I will try to fork pom to add some better errors. If the outcome is useful, I'll open a PR so you can consider it.
I love this package, I'm using it (abusing it? :wink:) to parse structured PDF documents: using lopdf and a custom output to extract a struct token stream of characters and strokes, passing the token stream to a pom-based parser to convert to a data structure.
I find that I often have errors in the form of
Err(Mismatch { message: "expect end of input, found: Terminal { typ: Char(CharTerminal('(', CharTerminalKind(SecondaryTitle, Bold))), page_num: 123 }", position: 29254 })
. Basically it means that the parsing of one of the items of the top-levelitem().repeat(0..)
is not matched by theitem()
parser. But the error message doesn't say anything about why it failed. Usually, by doing the parsing by hand on a piece of paper I am able to find the bug in my parser, but not always.Do you have any tips on how to debug pom parsers efficiently? PEG.js has something like this which looks nice, but really any sort of tip or trick would be useful. My parsing is growing quite complex and even though I try to keep it organized, it can be tough to debug it.