Closed jonestristand closed 1 year ago
I'm not sure that this would be the right approach to this issue. Just to abort lexing you can probably build a lexer like this:
const AnyOtherToken = createToken({
name: "AnyOtherToken",
pattern: () => throw new Error()
})
const Digit = createToken({ name: "Digit", pattern: /[0-9]/ })
const Whitespace = createToken({
name: "Whitespace",
pattern: /\s+/,
group: Lexer.SKIPPED
})
const customPatternLexer = new Lexer([Whitespace, Digit, AnyOtherToken])
This will throw an error while lexing if it encounters any character which doesn't match either the Whitespace
or Digit
token.
Hey thanks for the reply! My thought is that the solution proposed by the PR attached to this issue would be preferable because:
It's not clear to me why having a throwing token would be preferable, but I'd love to be enlightened!
Alright, sounds reasonable. I'll be looking into.
Hello @jonestristand
This feature request sounds logical and possible. Can you help me understand your use case? Are you dealing with many (large?) inputs where most of which are invalid and the current behavior is causing time to be wasted on inputs that have already been identified as irrelevant?
@bd82 I've included an example use case in the attached PPR (#1839) - but yes, I have very large files (accounting ledgers of several thousand lines) and would prefer not to recover if the input isn't strictly valid.
Hi @jonestristand
@msujew approved and merged your PR.
I will release a new version during this week or the weekend...
Cheers. Shahar.
re-opening this until a new version is released
Thanks guys, appreciate your considering this change!
released in 10.2.0
Useful to be able to disable recovery for the lexer. Currently it will skip input characters until it finds an offset that matches a token again, gives an error, and continues tokenizing. In some applications it would be desirable to stop lexing if no suitable token is found at the current offset.