antlr / grammars-v4

Grammars written for ANTLR v4; expectation that the grammars are free of actions.
MIT License
10.15k stars 3.7k forks source link

[Dart2] Using "operator" in the field name results in a syntax error #2597

Open yososs opened 2 years ago

yososs commented 2 years ago

Checking the Dart2 specification, 'operator' can be used as a field name. Probably the same problem will occur with other marked keywords.

Keyword Specifications for Dart2

Avoid using these words as identifiers. However, if necessary, the keywords marked with superscripts can be identifiers:

  • Words with the superscript 1 are contextual keywords, which have meaning only in specific places. They’re valid identifiers everywhere.
  • Words with the superscript 2 are built-in identifiers. These keywords are valid identifiers in most places, but they can’t be used as class or type names, or as import prefixes.
  • Words with the superscript 3 are limited reserved words related to asynchrony support. You can’t use await or yield as an identifier in any function body marked with async, async, or sync. All other words in the table are reserved words, which can’t be identifiers.

Reproduced code

class A{
  bool isPlusOrMinus(Expression expression) {
    if (expression.operator == '+') return true;
    if (expression.operator == '-') return true;
    return false;
  }
}
bkiers commented 2 years ago

Then probably this:

unconditionalAssignableSelector
  : '[' expression ']'
  | '.' identifier
  ;

should become:

unconditionalAssignableSelector
  : '[' expression ']'
  | '.' identifier
  | '.' 'operator'
  ;
kaby76 commented 2 years ago

This grammar is old. The newest grammar, maintained by Erik Ernst, is here, and appears to have fixed this issue. I recently ported the grammar here to "target-agnostic format" in response to an antlr-discussions question. I will update the grammar today.

yososs commented 2 years ago

Thanks for sharing. I will check the operation tomorrow.

yososs commented 2 years ago

I ran the following unit test code. It works well, but I found that there are still a few problems.

public class DartParserTest {

        // see: https://dart.dev/guides/language/language-tour#keywords
    @Test
    public void testKeywords0() {
        String[] keywords0 = { "assert", "break", "case", "catch", "class", "const", "continue", "default", "do", "else",
                "enum", "extends", "false", "final", "finally", "for", "if", "in", "is", 
                "new", 
                "null", "rethrow",
                "return", "super", "switch", "this", "throw", "true", "try", "var", "void", "while", "with" };

        for (String k : keywords0) {
            String content = "class A{\n"
                    + "  bool isPlusOrMinus(Expression expression) {\n"
                    + "    if (expression."+k+" == '+') return true;\n"
                    + "    if (expression."+k+" == '-') return true;\n"
                    + "    return false;\n"
                    + "  }\n"
                    + "}\n";
//          System.out.println(content);

            final CodePointCharStream cstream = CharStreams.fromString(content);
            final DartLexer lexer = new DartLexer(cstream);
            final CommonTokenStream stream = new CommonTokenStream(lexer);
            stream.fill();
            DartParser parser = new DartParser(stream);
            boolean[] syntaxErr = new boolean[1];
            parser.addErrorListener(new BaseErrorListener() {
                @Override
                public void syntaxError(Recognizer<?, ?> arg0, Object arg1, int arg2, int arg3, String arg4,
                        RecognitionException arg5) {
                    syntaxErr[0] = true;
                }
            });
            LibraryDefinitionContext root = parser.libraryDefinition();
            Assert.assertTrue("error in "+k, syntaxErr[0]);
        }
    }

    @Test
    public void testKeywords1() {
        String[] keywords1 = {"show", "async", "sync", "on", "hide"};

        for (String k : keywords1) {
            String content = "class A{\n"
                    + "  bool isPlusOrMinus(Expression expression) {\n"
                    + "    if (expression."+k+" == '+') return true;\n"
                    + "    if (expression."+k+" == '-') return true;\n"
                    + "    return false;\n"
                    + "  }\n"
                    + "}\n";
//          System.out.println(content);

            final CodePointCharStream cstream = CharStreams.fromString(content);
            final DartLexer lexer = new DartLexer(cstream);
            final CommonTokenStream stream = new CommonTokenStream(lexer);
            stream.fill();
            DartParser parser = new DartParser(stream);
            boolean[] syntaxErr = new boolean[1];
            parser.addErrorListener(new BaseErrorListener() {
                @Override
                public void syntaxError(Recognizer<?, ?> arg0, Object arg1, int arg2, int arg3, String arg4,
                        RecognitionException arg5) {
                    syntaxErr[0] = true;
                }
            });
            LibraryDefinitionContext root = parser.libraryDefinition();
            Assert.assertFalse("error in "+k, syntaxErr[0]);
        }
    }

    @Test
    public void testKeywords2() {
        String[] keywords2 = { "abstract", "as", "covariant", "deferred", "dynamic", "export", "extension", "external",
                "factory", "Function", "get", "implements", "import", "interface", "late", "library", "mixin",
                "operator", "part", "required", "set", "static", "typedef" };

        for (String k : keywords2) {
            String content = "class A{\n"
                    + "  bool isPlusOrMinus(Expression expression) {\n"
                    + "    if (expression."+k+" == '+') return true;\n"
                    + "    if (expression."+k+" == '-') return true;\n"
                    + "    return false;\n"
                    + "  }\n"
                    + "}\n";
//          System.out.println(content);

            final CodePointCharStream cstream = CharStreams.fromString(content);
            final DartLexer lexer = new DartLexer(cstream);
            final CommonTokenStream stream = new CommonTokenStream(lexer);
            stream.fill();
            DartParser parser = new DartParser(stream);
            boolean[] syntaxErr = new boolean[1];
            parser.addErrorListener(new BaseErrorListener() {
                @Override
                public void syntaxError(Recognizer<?, ?> arg0, Object arg1, int arg2, int arg3, String arg4,
                        RecognitionException arg5) {
                    syntaxErr[0] = true;
                }
            });
            LibraryDefinitionContext root = parser.libraryDefinition();
            Assert.assertFalse("error in "+k, syntaxErr[0]);
        }
    }

    @Test
    public void testKeywords3_async() {
        String[] keywords3 = {"await", "yield"};

        for (String k : keywords3) {
            String content = "class A{\n"
                    + "  bool isPlusOrMinus(Expression expression) async {\n"
                    + "    if (expression."+k+" == '+') return true;\n"
                    + "    if (expression."+k+" == '-') return true;\n"
                    + "    return false;\n"
                    + "  }\n"
                    + "}\n";
//          System.out.println(content);

            final CodePointCharStream cstream = CharStreams.fromString(content);
            final DartLexer lexer = new DartLexer(cstream);
            final CommonTokenStream stream = new CommonTokenStream(lexer);
            stream.fill();
            DartParser parser = new DartParser(stream);
            boolean[] syntaxErr = new boolean[1];
            parser.addErrorListener(new BaseErrorListener() {
                @Override
                public void syntaxError(Recognizer<?, ?> arg0, Object arg1, int arg2, int arg3, String arg4,
                        RecognitionException arg5) {
                    syntaxErr[0] = true;
                }
            });
            LibraryDefinitionContext root = parser.libraryDefinition();
            Assert.assertTrue("error in "+k, syntaxErr[0]);
        }
    }

    @Test
    public void testKeywords3() {
        String[] keywords3 = {"await", "yield"};

        for (String k : keywords3) {
            String content = "class A{\n"
                    + "  bool isPlusOrMinus(Expression expression) {\n"
                    + "    if (expression."+k+" == '+') return true;\n"
                    + "    if (expression."+k+" == '-') return true;\n"
                    + "    return false;\n"
                    + "  }\n"
                    + "}\n";
//          System.out.println(content);

            final CodePointCharStream cstream = CharStreams.fromString(content);
            final DartLexer lexer = new DartLexer(cstream);
            final CommonTokenStream stream = new CommonTokenStream(lexer);
            stream.fill();
            DartParser parser = new DartParser(stream);
            boolean[] syntaxErr = new boolean[1];
            parser.addErrorListener(new BaseErrorListener() {
                @Override
                public void syntaxError(Recognizer<?, ?> arg0, Object arg1, int arg2, int arg3, String arg4,
                        RecognitionException arg5) {
                    syntaxErr[0] = true;
                }
            });
            LibraryDefinitionContext root = parser.libraryDefinition();
            Assert.assertFalse("error in "+k, syntaxErr[0]);
        }
    }
}
kaby76 commented 2 years ago
kaby76 commented 2 years ago

A little bit of status...

I've been updating the scraper and have a grammar that works only "so so"--but better than the other available grammars. See https://github.com/kaby76/ScrapeDartSpec/blob/master/scraped.g4

I've written a small thread describing how this compares with the current "dart2/" grammar and the "reference grammar" that was written by the Dart Language Team. https://twitter.com/KenDomino/status/1533053623554428929

There is still a lot of work to do.

yososs commented 2 years ago

Do you have a comparison to the antlr4 grammar found in the dart sdk?

kaby76 commented 2 years ago

Yes. I ran sdk sources through the Dart grammar written by the Dart Language Team. The results are here. It didn't do as well as the scraped grammar.

yososs commented 2 years ago

Comparison results are good. I will actually use it too.

yososs commented 2 years ago

I ran the same test using scraped.g4.

The test for testKeywords0 now passes, but the test for testKeywords3 fails.

The Dart language seems to have a complicated syntax due to the special specification of keywords.

kaby76 commented 2 years ago

The Spec does not define rules for dynamic types. https://github.com/dart-lang/language/issues/2276. After adding in 'dynamic' as a type, the grammar accepts 78% of the Dart sdk. Much much better.

kaby76 commented 2 years ago

https://github.com/dart-lang/language/issues/2279

Now 94% of the sdk passing.

kaby76 commented 2 years ago

Another problem with the Spec, https://github.com/dart-lang/language/issues/2282, occurs with "abstract" modifiers on fields. I have a workaround, but it's a terrible hack (the old rule was this; it is now this). The grammar in the Spec doesn't even corresponding directly to the hand-written parser in the Dart compiler.

Now 95% of the sdk passing.

kaby76 commented 2 years ago

Status: I have a new grammar that passes 369 out of 372 Dart source files in the sdk. I think I'll stop here. I plan on using this as a bootstrap grammar to parse the Dart compiler and scrape the grammar directly from the sources. Although the quality of the grammar that the Dart team provides is very good, the fact that it's two years behind the source code means that it'll be always out of date. It's a similar situation for other languages. Scraping the source of the compiler is the only real solution.

kaby76 commented 2 years ago

Status

The good news: I have a new Dart2 grammar that parses 100% of the Dart2 SDK.

The grammar requires two semantic predicates in the lexer. Since I want this to work across targets, I've been working to write the grammar in "target agnostic format".

However, the split parser for C# is not working. I have done many dozens of these conversions to "target agnostic format", for all but one of the targets, so I am confident that I am doing it correctly. While the lexer tokens are the same, the parser operates differently between split vs combine.

Therefore, it is likely that I've stumbled on a bug in the parser runtime for C#. I am looking into the problem.

kaby76 commented 2 years ago

The error occurs in both C# and Java for a split grammar, but not for the combined grammar for either target. This is bad. It means there is a problem across targets for split grammars--unless the combined grammar code was supposed to produce a parse error.

kaby76 commented 2 years ago

The problem was with string literals. I defined rules that should not have been there. https://github.com/antlr/grammars-v4/pull/2654 fixes #2597.