Chevrotain / chevrotain

Parser Building Toolkit for JavaScript
https://chevrotain.io
Apache License 2.0
2.44k stars 199 forks source link

EOF token has no location information #2004

Closed hackwaly closed 7 months ago

hackwaly commented 7 months ago
image
msujew commented 7 months ago

This is by design. EOF doesn't match anything in the input string, so it has no location.

hackwaly commented 7 months ago

This is by design. EOF doesn't match anything in the input string, so it has no location.

But how do I report error to IDE? I need location to report diagnostic at EOF.

"EOF doesn't match anything" is wrong. It only match at end of file position. It do have location.

msujew commented 7 months ago

@hackwaly With "EOF doesn't match anything" I meant that it doesn't match any particular string pattern. See also the image being defined as "". If it doesn't match any string, there is no location for it.

Aside from that, EOF is just the last position in the input string. Isn't that enough for error reporting?

hackwaly commented 7 months ago

Isn't that enough for error reporting?

Chevrotain tracks line information for me. I have no way to get line number of EOF without get it from IToken.

msujew commented 7 months ago

The end of the input can always be found at the end of the token before EOF. I.e. the last token that is produced by the lexer. Note that the lexer doesn't actually "produce" EOF, so there's also no technical way of actually adding the location information there, since the token is unaware of the input.

hackwaly commented 7 months ago

The end position of the last token that is produced by the lexer isn't always equal to position of EOF. Because there may be skipped whitespace token.

Is there a way to let lexer emit a zero-width user-defined MyEOF token?

msujew commented 7 months ago

Is there a way to let lexer emit a zero-width user-defined MyEOF token?

No, because if the lexer doesn't advance, it will produce tokens in an endless loop. A token needs to have a content.

Because there may be skipped whitespace token.

You can assign a group to the whitespace token other than Lexer.SKIPPED and look for the group in the lexer result. It won't influence the parsing process, but you gain access to the lexed tokens that way.