ramonsnir commented 3 years ago

I love this package, I'm using it (abusing it? :wink:) to parse structured PDF documents: using lopdf and a custom output to extract a struct token stream of characters and strokes, passing the token stream to a pom-based parser to convert to a data structure.

I find that I often have errors in the form of Err(Mismatch { message: "expect end of input, found: Terminal { typ: Char(CharTerminal('(', CharTerminalKind(SecondaryTitle, Bold))), page_num: 123 }", position: 29254 }). Basically it means that the parsing of one of the items of the top-level item().repeat(0..) is not matched by the item() parser. But the error message doesn't say anything about why it failed. Usually, by doing the parsing by hand on a piece of paper I am able to find the bug in my parser, but not always.

Do you have any tips on how to debug pom parsers efficiently? PEG.js has something like this which looks nice, but really any sort of tip or trick would be useful. My parsing is growing quite complex and even though I try to keep it organized, it can be tough to debug it.

J-F-Liu commented 3 years ago

Error reporting is the area needs to be improved. You may try use a lexer e.g. logos to produce token stream, then use pom-based parser to create high level data structure.

ramonsnir commented 3 years ago

Thanks, @J-F-Liu . I have a sophisticated hand-written lexer. My errors are on the parsing level. I will try to fork pom to add some better errors. If the outcome is useful, I'll open a PR so you can consider it.

ramonsnir commented 3 years ago

46 is what I came up with at the end

J-F-Liu / pom

Tips on debugging #45

46 is what I came up with at the end