Chevrotain / chevrotain

Parser Building Toolkit for JavaScript
https://chevrotain.io
Apache License 2.0
2.44k stars 199 forks source link

Unexpected character error does not show correct column #1969

Closed baodart closed 10 months ago

baodart commented 11 months ago

Here's a reproduction example:

import { createToken, Lexer } from 'chevrotain'

const WHITESPACE = createToken({ name: 'WHITESPACE', pattern: /\s+/, group: Lexer.SKIPPED })
const NUMBER = createToken({ name: 'NUMBER', pattern: /(0|[1-9]\d*)(\.\d+)?/ })

const lexer = new Lexer([WHITESPACE, NUMBER], { ensureOptimizations: true })

const { tokens, errors } = lexer.tokenize('-"5"')

console.log('tokens', tokens)
console.log('errors', errors)

which logs image

As the lexer can only understand whitespace and number literal, only 5 is captured as tokens here. There are 2 'unexpected character' errors:

bd82 commented 10 months ago

Thanks for reporting this @baodart and providing a minimal reproduction 👍

There is another issue here, the valid token "5" column is reported as 1 instead of 3 I suspect that the logic to keep track of column numbers in case of lexer error recovery.

bd82 commented 10 months ago

fix was released in 11.0.3 version available on npm