erikrose / parsimonious

The fastest pure-Python PEG parser I can muster
MIT License
1.8k stars 126 forks source link

ParseError thrown from TokenGrammar can't be printed #171

Closed boringcactus closed 2 years ago

boringcactus commented 3 years ago

I'm using Parsimonious to implement the reference compiler for a language I'm working on, and in the absence of maximally-Unicode-expressive regular expression support (would be fixed by #162 or eventually abandoning regular expressions for something more well-scoped) I'd like to use a custom tokenizer and the TokenGrammar class. However, the __str__ method on ParseError assumes its text property is a string rather than a token sequence: https://github.com/erikrose/parsimonious/blob/master/parsimonious/exceptions.py#L35

I'm not quite sure how this should be fixed, but it does make debugging and error reporting more difficult.

habnabit commented 3 years ago

I see at least currently you're using a custom tokenizer and parsimonius as a sequence matcher on top of that.. how is that approach? https://git.sr.ht/~boringcactus/crowbar-reference-compiler/tree/main/item/crowbar_reference_compiler/parser.py

I too want better tokenization than parsimonious seems capable of, and what you have there seems reasonable for your tasks, but I was hoping for a smaller solution for a smaller project.

boringcactus commented 3 years ago

@habnabit that approach worked fine for that project, but i had to be extra sure my regexes were getting tested in the right order