Chevrotain / chevrotain

Parser Building Toolkit for JavaScript
https://chevrotain.io
Apache License 2.0
2.44k stars 200 forks source link

Lexer throws uncaught error when longer_alt token is not in mode #1825

Closed TheFireBlast closed 1 year ago

TheFireBlast commented 1 year ago
import { Lexer, createToken } from "chevrotain";

const Whitespace = createToken({ name: "Whitespace", pattern: /\s+/, group: Lexer.SKIPPED });
const Identifier = createToken({ name: "Identifier", pattern: /[a-zA-z]\w+/ });
const Item = createToken({ name: "Item", pattern: /\w+/ });
const Push = createToken({ name: "Push", pattern: /push/, longer_alt: Identifier, push_mode: "body" });
const Pop = createToken({ name: "Pop", pattern: /pop/, longer_alt: Identifier, pop_mode: true });

const lexer = new Lexer({
    modes: {
        default: [Whitespace, Push, Identifier],
        body: [Whitespace, Pop, Item],
    },
    defaultMode: "default",
});

console.log(lexer.tokenize(`push pushfoo bar pop`));

The example above throws the following error:

                            var longerAltPattern = longerAltConfig.pattern;
                                                                   ^

TypeError: Cannot read properties of undefined (reading 'pattern')
    at Lexer.tokenizeInternal (...\node_modules\chevrotain\src\scan\lexer_public.ts:576:56)
    at Lexer.tokenize (...\node_modules\chevrotain\src\scan\lexer_public.ts:392:17)
    ...

https://github.com/Chevrotain/chevrotain/blob/376d9fe6065fb1e0cf78d02a25f625bf03f3d7cb/packages/chevrotain/src/scan/lexer_public.ts#L576

This happens because Pop's longer_alt is Identifier (instead of Item), which isn't part of the body mode. So, instead failing internally, it should create a more descriptive error, eg.

Lexer Mode: ->body<- is missing the Token Type ->Identifier<- required by Token Type ->Pop<- as a longer alternative



Also, (not really important or related to this issue, but) it looks like many errors have a <- that doesn't have a space after it

https://github.com/Chevrotain/chevrotain/blob/376d9fe6065fb1e0cf78d02a25f625bf03f3d7cb/packages/chevrotain/src/scan/lexer.ts#L722-L723

And sometimes Token Type is just refered to as Token

https://github.com/Chevrotain/chevrotain/blob/376d9fe6065fb1e0cf78d02a25f625bf03f3d7cb/packages/chevrotain/src/scan/lexer.ts#L764

msujew commented 1 year ago

Funnily enough, I just noticed this issue as well last week, although the original contribution is already relatively old. I'm working on a fix for this right now.