Closed notramo closed 2 years ago
Hmmm. I've never really written an indent sensitive parser, since it sounds like a pain, so I'm not sure what techniques are used.
Had a look around, and other similar parser libraries (PEG and parser combinators are closely related) have had similar discussions on this. Have a look: https://github.com/pegjs/pegjs/issues/217
It doesn't seem like mecha
would be the best fit for this
What about using a converter function to push/pop from a state and emit the respective tokens? Is it reliable, or the order of function execution could mess the state machine up?
Another idea is to parse all space at the start of the line to an PrefixSpace token, which includes ^\s*
, and do post processing on the tokens to swap the PrefixSpace tokens out to Indent or Dedent.
:shrug:
One thing to keep in mind, is that mecha basically does backtracking. If one branch fails to parse, the next one is tried. You would need functions that make sure the state machine is restored to some sensible state before mecha tries to parse a different "branch" of syntax.
Edit: To be clear, it is not really mecha per say, that "backtracks", but that is how the mecha.oneOf
function does what it does. It tries all parses, one after another, until one succeeds.
I think parsing it to an Indent token, then post processing it is sufficient.
Thank you!
I want to parse a format where indents are used instead of brackets, like in Nim or Python. What's the best approach to this task using this library?
Considering later stages of the parsing (AST generation), probably it would be the easiest approach to have an Indent and a Dedent token type. If one line has 3 levels of indentation, then the next only has 2 levels, then it would generate a Dedent token before the first token on the next line. This will probably need some state machine which tracks the level of the last indentation. Is this library capable of doing this?