ccrma / chuck

ChucK Music Programming Language
http://chuck.stanford.edu/
GNU General Public License v2.0
799 stars 127 forks source link

Syntax error location misidentified #128

Open forrcaho opened 5 years ago

forrcaho commented 5 years ago

Here is a simple, syntactically incorrect chuck program:

1 => int a;x
int b;

The output from attempting to run this is [syntax_error2.ck]:line(2).char(5): syntax error

This is on the character "b" in the code.

The ideal output would be line(1).char(12), which is on the x, the spurious character causing the error.

It makes sense that the parser hasn't yet identified the error when it gets to the x, and I suspect if the error were identified as being on the i of int, it would be clear enough.

However, the parser proceeds past the keyword 'int' and identifies the error at the start of the following variable name.

This would be really confusing to a user in a visual editor, looking at the red squiggle under the 'b' and trying to figure out what the error is.

lathertonj commented 5 years ago

Looks like the lexer is responsible for advancing EM_error's notion of where we are in the file (EM_tokPos), using the adjust() function that is called before returning each token found in the lexer. And the way the parser is written, it probably needs to parse all three of those tokens before it can figure out there is no possible AST that can be built from these tokens. I personally don't know / remember enough about rewriting parsers to make it be able to detect this on just the first or second token.

One possible solution would be in addition to keeping track of the most recent token that is lexed, we could also keep track of the most recent semicolon token that is lexed, and surface that information during the syntax error. Then, your vscode plugin could red-underline the entire region from the character after the previous semicolon to the end of the token that completes the phrase that can't be parsed.

But also, I feel like I have seen other programming languages not quite know where the real error is when I have written very malformed code, so maybe this is ok as-is / maybe leaving it as-is would be better than the hack I suggested above.

For example, the C# vscode editor I have highlights the second int below, not y, so one token better than ChucK, but also not what you're asking for.

int x;y
int z;

Thoughts?