An error deep in the code of an input class shows an error message starting at the class

parrt commented 10 years ago

Using the grammar at tag https://github.com/antlr/grammars-v4/blob/distant_error_msg/swift/Swift.g4 we get a horrible error message that basically says I don't recognize the class just because he doesn't know what to do with a floating-point number, 2.0, deep in a method. Input exhibiting the problem is: https://github.com/antlr/grammars-v4/blob/master/swift/examples/Snippet.swift

I get the same error message in the interpreter and compiled code via grun, which is encouraging. ;) Somehow we need to enhance our check that looks for successfully predicted rules, rather than rolling all the way back to the class token.

Even:

var b = i < 2.0

highlights the problem in that it doesn't recognize anything but clearly parses lots of rules before it gets to the floating-point number. Here's a sample run that shows that the tokens look okay:

~/antlr/code/grammars-v4/swift $ grun Swift top_level -tokens
var b = i < 2.0
[@0,0:2='var',<46>,1:0]
[@1,3:3=' ',<96>,channel=1,1:3]
[@2,4:4='b',<87>,1:4]
[@3,5:5=' ',<96>,channel=1,1:5]
[@4,6:6='=',<29>,1:6]
[@5,7:7=' ',<96>,channel=1,1:7]
[@6,8:8='i',<87>,1:8]
[@7,9:9=' ',<96>,channel=1,1:9]
[@8,10:10='<',<7>,1:10]
[@9,11:11=' ',<96>,channel=1,1:11]
[@10,12:14='2.0',<93>,1:12]
[@11,15:15='\n',<96>,channel=1,1:15]
[@12,16:15='<EOF>',<-1>,2:16]
line 1:12 no viable alternative at input 'var b = i < 2.0'
~/antlr/code/grammars-v4/swift $ grep "93$" *.tokens
Swift.tokens:Floating_point_literal=93

sharwell commented 10 years ago

I don't understand what the issue here is.

parrt commented 10 years ago

Sorry. error msg is unhelpful:

line 7:29 no viable alternative at input 'class T {\n\tfunc foo() {\n\t\tfor var i:CGFloat = 0; i < 2.0'

sharwell commented 10 years ago

What do you propose to do about this?

One possible strategy is update adaptivePredict to return the alternative that let it parse the most input symbols before reaching the syntax error. This case tends to result in several additional "frames" being added to the parser rule stack before finally throwing an exception, typically within a call to match. I actually tried this out once in the past, and my observation is it resulted in exceptional error localization but terrible error recovery. The parse tree from the point of the first error to the end of the file was frequently unusable.

The current strategy results in poor error localization for long lookahead sequences, but generally offers improved error recovery.

Fixing this issue would likely involve combining the results of both of the above, using the first strategy up until the syntax error was reached, followed by using the results of the second strategy to recover.

parrt commented 10 years ago

What I noticed was that the error recovery consumed literally the entire input as it did not know what to do with the sequence starting with class. At least in this case, recovery was terrible and reporting was terrible. ;)

I like the idea of returning the alternative that allowed it to parse the deepest. Currently we to give up and try to recover. Perhaps we do something that tracks how many complete rules it matches before it gets the error. Surely it's the case that simple k-token-like lookahead should try to recover, but if it matches a complete declaration or method within a class definition, it should not try to recover. Perhaps it's a simple depth k threshold that we look for to determine whether we should recover or keep going. We wouldn't want to keep going just because a simple rule like typeName matched a few times.

This problem comes up when we try to distinguish alternatives with very deep lookahead requirements, so maybe the solution involves depth or how much was correctly recognized.

parrt commented 10 years ago

I have another thought here. I think the problem is when you have a closure loop, like in the Swift grammar:

top_level : (statement | expression)* EOF ;

It cannot match either a statement or expression because the class declaration statement has an error somewhere deep in the class. However, it also cannot match what follows, the end of file. Perhaps the simple rule is:

If the closure loop cannot match the body of the loop or what follows, but it matches at least the first token somehow in the body, enter the body instead of immediately reporting an error in recovering.

When we enter the loop and must decide between alternatives, it will most likely do something natural because, for example, 'class' does not start an expression but it does start a statement.

Wrinkle. If prediction must look deeply to distinguish between alternatives such as (decl ';' | def body)*, then the current mechanism would throw an exception if there was an error that prevented it from matching decl or def completely. I'll have to think more about this variant of the problem.

sharwell commented 10 years ago

An interesting heuristic is, of the alternatives that matched the most before failing, choose the one with the shortest context stack.

parrt commented 10 years ago

Would want to count how many rules they successfully parsed instead of the depth maybe. not sure what the depth would convey. One might want the longest stack as it would indicate that it was in the process of matching a more specific sub phrase.

antlr / antlr4

An error deep in the code of an input class shows an error message starting at the class #608