antlr / grammars-v4

Grammars written for ANTLR v4; expectation that the grammars are free of actions.
MIT License
9.96k stars 3.68k forks source link

Kotlin grammars #3965

Open kaby76 opened 5 months ago

kaby76 commented 5 months ago

I am trying to investigate https://github.com/antlr/antlr4-lab/issues/83.

kaby76 commented 4 months ago

In addition, NL and semi are cavalierly sprinkled throughout the grammar, which causes ambiguity and quite poor performance. There is no theoretical, consistent, thought-out manner of how it should be used, and where it should be used. For example, consider how propertyDeclaration is parsed.

Input:

var a = 1
var a = 2
var a = 3
var a = 4
var a = 5
var a = 6
var a = 7

This input causes large k lookahead because the parser requires full context to understand when to use the NL, whether in propertyDeclaration, or in topLevelObject. It is even wrong in the "spec" grammar implementation.

There is even a faux pas following the wrong "NL" use in the production. ["(getter? (NL semi? setter)? | setter? (NL* semi? getter)?)"](https://github.com/Kotlin/kotlin-spec/blob/4b29a8b42e08237f45c0c3c185eaae4bba3751f6/grammar/src/main/antlr/KotlinParser.g4#L186C28-L186C87) is an alt with both sides that can derive empty. A grammar should never offer the choice of empty vs empty!

$ (trperf y > out; cat out | head -1 > out2; cat out | tail -n +2 | sort -k6 -n -r | head > out3; cat out2 out3 | column -t)
Time to parse: 00:00:00.1550288
Decision  Rule                    Invocations  Time      Total-k  Max-k  Fallback  Ambiguities  Errors  Transitions
157       propertyDeclaration     14           0.331425  203      50     7         7            0       25
305       postfixUnaryExpression  7            0.064142  21       3      0         0            0       4
146       propertyDeclaration     7            0.029213  21       3      0         0            0       2
300       asExpression            7            0.056031  14       2      0         0            0       3
289       elvisExpression         7            0.055394  14       2      0         0            0       3
278       conjunction             7            0.058866  14       2      0         0            0       3
275       disjunction             7            0.057148  14       2      0         0            0       3
156       propertyDeclaration     7            0.071258  14       2      0         0            0       3
350       primaryExpression       7            0.000744  7        1      0         0            0       1
301       prefixUnaryExpression   7            0.002013  7        1      0         0            0       1

If you correct the NL's in propertyDeclaration and getter/setter, the max-k's are somewhat resolved.

$ diff KotlinParser.g4 ..
178,179c178
<     ) (NL* typeConstraints)? (NL* ('=' NL* expression | propertyDelegate))?
< (
---
>     ) (NL* typeConstraints)? (NL* ('=' NL* expression | propertyDelegate))? (NL+ ';')? NL* (
203,204c202,203
<     : NL? modifiers? 'get'
<     | NL? modifiers? 'get' NL* '(' NL* ')' (NL* ':' NL* type_)? NL* functionBody
---
>     : modifiers? 'get'
>     | modifiers? 'get' NL* '(' NL* ')' (NL* ':' NL* type_)? NL* functionBody
208,209c207,208
<     : NL? modifiers? 'set'
<     | NL? modifiers? 'set' NL* '(' (annotation | parameterModifier)* setterParameter ')' (
---
>     : modifiers? 'set'
>     | modifiers? 'set' NL* '(' (annotation | parameterModifier)* setterParameter ')' (
02/10-07:58:19 ~/issues/g4-3959/kotlin/kotlin-formal/Generated-CSharp
$ (trperf y > out; cat out | head -1 > out2; cat out | tail -n +2 | sort -k6 -n -r | head > out3; cat out2 out3 | column -t)
Time to parse: 00:00:00.1211182
Decision  Rule                    Invocations  Time      Total-k  Max-k  Fallback  Ambiguities  Errors  Transitions
306       postfixUnaryExpression  7            0.051186  21       3      0         0            0       4
146       propertyDeclaration     7            0.030375  21       3      0         0            0       2
301       asExpression            7            0.044945  14       2      0         0            0       3
290       elvisExpression         7            0.041122  14       2      0         0            0       3
279       conjunction             7            0.048455  14       2      0         0            0       3
276       disjunction             7            0.038296  14       2      0         0            0       3
163       propertyDeclaration     7            0.191566  21       2      7         7            0       15
158       propertyDeclaration     7            0.056232  14       2      0         0            0       3
155       propertyDeclaration     7            0.039291  14       2      0         0            0       3
502       semis                   7            0.004448  7        1      0         0            0       2
02/10-07:59:24 ~/issues/g4-3959/kotlin/kotlin-formal/Generated-CSharp