Open HanzoDev1375 opened 4 days ago
Please post the entire input (or inputs) as text, or attach the input in a .txt file (or files). It's best to not post pictures. Thanks.
SettingAppActivity.java.txt see @kaby76
Input is only 898 lines, takes ~90s to parse, result is success. Yes, this is terrible performance. But, unfortunately expected.
This grammar is a direct implementation of the Java Language Spec 20 grammar in Chapter 19. It is very ambiguous.
Here is the ambiguity uncovered for the input. The tools used are part of the Trash Toolkit.
$ dotnet trperf -c afdr /c/Users/Kenne/Downloads/SettingAppActivity.java.txt | grep -v '^0' | sort -k1 -n
Time to parse: 00:01:29.1786105
1 1 10 classOrInterfaceType
2 2 20 classType
5 5 341 relationalExpression
8 8 84 unannClassOrInterfaceType
21 22 6 referenceType
97 104 82 unannReferenceType
165 199 36 packageName
174 174 272 primaryNoNewArray
223 223 315 methodInvocation
Output:
The good news, if you say anything good about this, is that the ratio of the number of ambiguities to the number of fallbacks is more or less one-to-one. This means that most of the problem is with ambiguity and not DFA transition conflicts. DFA transition conflicts are generally harder to fix.
methodInvocation
seems to be the worst. Here is an example that exhibits the problem.
$ cat /c/Users/Kenne/Downloads/SettingAppActivity.java3.txt
public class SettingAppActivity extends BaseCompat {
@Override
protected void onCreate(Bundle _savedInstanceState) {
super.onCreate(_savedInstanceState);
}
}
11/18-07:22:02 ~/issues/g4-current/java/java20/Generated-CSharp
Decision state 315 is the problem.
Decision 315, the first state after entry, has a choice among a half dozen alts. (I can't tell which alts because trparse --ambig
is crashing. https://github.com/kaby76/Trash/issues/507)
Input is only 898 lines, takes ~90s to parse, result is success. Yes, this is terrible performance. But, unfortunately expected.
This grammar is a direct implementation of the Java Language Spec 20 grammar in Chapter 19. It is very ambiguous.
Here is the ambiguity uncovered for the input. The tools used are part of the Trash Toolkit.
$ dotnet trperf -c afdr /c/Users/Kenne/Downloads/SettingAppActivity.java.txt | grep -v '^0' | sort -k1 -n Time to parse: 00:01:29.1786105 1 1 10 classOrInterfaceType 2 2 20 classType 5 5 341 relationalExpression 8 8 84 unannClassOrInterfaceType 21 22 6 referenceType 97 104 82 unannReferenceType 165 199 36 packageName 174 174 272 primaryNoNewArray 223 223 315 methodInvocation
Output:
- Column 1 is the number of ambiguities counted.
- Column 2 is the number of fallbacks counted.
- Column 3 is the NFA state number for the decision.
- Column 4 is the rule that the NFA state appears in.
The good news, if you say anything good about this, is that the ratio of the number of ambiguities to the number of fallbacks is more or less one-to-one. This means that most of the problem is with ambiguity and not DFA transition conflicts. DFA transition conflicts are generally harder to fix.
methodInvocation
seems to be the worst. Here is an example that exhibits the problem.$ cat /c/Users/Kenne/Downloads/SettingAppActivity.java3.txt public class SettingAppActivity extends BaseCompat { @Override protected void onCreate(Bundle _savedInstanceState) { super.onCreate(_savedInstanceState); } } 11/18-07:22:02 ~/issues/g4-current/java/java20/Generated-CSharp
Decision state 315 is the problem.
Decision 315, the first state after entry, has a choice among a half dozen alts. (I can't tell which alts because
trparse --ambig
is crashing. kaby76/Trash#507)
So this is a problem of grammar?
So this is a problem of grammar?
Yes. It's a grammar problem, not an "Antlr problem".
Many of the grammars in this repo can be slow because of ambiguity. This usually happens because someone derives the grammar from another grammar, then tries to use that with Antlr. Much of the time, the grammar requires a symbol table to disambiguate.
For better or worse, Antlr will accept an ambiguous grammar, and generate a parser for it. But, just as one can write atrocious code in Java, C#, JavaScript, etc., one can do with Antlr. People then say "Antlr is terrible," but it's usually the grammar that is the problem.
The solution is to eliminate ambiguity in the grammar.
So this is a problem of grammar?
Yes. It's a grammar problem, not an "Antlr problem".
Many of the grammars in this repo can be slow because of ambiguity. This usually happens because someone derives the grammar from another grammar, then tries to use that with Antlr. Much of the time, the grammar requires a symbol table to disambiguate.
For better or worse, Antlr will accept an ambiguous grammar, and generate a parser for it. But, just as one can write atrocious code in Java, C#, JavaScript, etc., one can do with Antlr. People then say "Antlr is terrible," but it's usually the grammar that is the problem.
The solution is to eliminate ambiguity in the grammar.
Is there a solution for treatment?
Is there a solution for treatment?
Yes, the grammar should be fixed. I am working my way through the grammars and cleaning up ambiguity and fallbacks. I can address this grammar after postgresql, mysql/Oracle, then java/java20, likely a couple of weeks from now.
Is there a solution for treatment?
Yes, the grammar should be fixed. I am working my way through the grammars and cleaning up ambiguity and fallbacks. I can address this grammar after postgresql, mysql/Oracle, then java/java20, likely a couple of weeks from now.
oh tanks sir🥰
@kaby76 I have a question in my mind, is it possible to make a code formatter with lexer and parser?
As you can see in the photo, the Java 20 parser cannot analyze well
my code