antlr / grammars-v4

Grammars written for ANTLR v4; expectation that the grammars are free of actions.
MIT License
10.26k stars 3.72k forks source link

Remove excess keywords and parse rules from PL/SQL grammar #1606

Open KvanTTT opened 4 years ago

KvanTTT commented 4 years ago

Remove excess keywords

Remove keywords from the lexer that are not presented in the official doc: https://docs.oracle.com/cd/B19306_01/appdev.102/b14261/reservewords.htm and replace artificial keywords with context keywords.

For instance, consider the lexer rule AUTONOMOUS_TRANSACTION. Remove it from lexer:

AUTONOMOUS_TRANSACTION:       'AUTONOMOUS_TRANSACTION';

Replace this construction:

pragma_declaration
    : AUTONOMOUS_TRANSACTION
    ... ;

with the following:

pragma_declaration
    : {n("AUTONOMOUS_TRANSACTION")}? regular_id
    ... ;

where p is defined as follows in PlSqlParserBase.cs:

protected bool n(string str)
{
    return _input.Lt(1).Text.Equals(str, StringComparison.OrdinalIgnoreCase);
}

Also, if you want to distinguish context keywords and identifiers, you can use rule element labels:

pragma_declaration
    : {n("AUTONOMOUS_TRANSACTION")}? AUTONOMOUS_TRANSACTION=regular_id
    ... ;

I know, this is too excessively, but the feature for token value comparison is not implemented yet.

We've implemented the same approach for JavaScript.

Remove excess parse rules

Get rid of excess parsing rules. For instance, in the following case:

partition_name_old
    : partition_name
    ;

partition_name
    : regular_id
    ;

Rules partition_name_old and partition_name can be replaced with regular_id and removed.

Such optimizations (especially the first one) should significantly decrease the size of generated parse and maybe improve performance. Out of curiosity, check the size before and after and write about results here.

/cc @codeFather2

KvanTTT commented 4 years ago

Also, take a look at this text in the official doc:

Reserved words can never be used as identifiers. Keywords can be used as identifiers, but this is not recommended.

It means that "Reserved Words" can not be regular_id at all, "Keywords" can be.

But even so, a lot for keywords remains. And I'm suggesting not to introduce a token if it is used only at one time.