Open kaby76 opened 3 days ago
Symbol classes should be disjoint. I took the gram.y grammar and extracted the symbol classes for each of the relevant productions. There are attached here: yacc_bare_label_keyword.txt yacc_col_name_keyword.txt yacc_reserved_keyword.txt yacc_type_func_name_keyword.txt yacc_unreserved_keyword.txt
I have checked the disjointness of these sets using first sort -c ...
for each file, and then comm -1 -2 ... ...
across all permutations of the files listed. The yacc grammar is correct.
Over in the Antlr grammar, the symbol sets are: sym_col_name_keyword.txt sym_plsql_unreserved_keyword.txt sym_reserved_keyword.txt sym_type_func_name_keyword.txt sym_unreserved_keyword.txt
These sets are also disjoint--except for plsql_unreserved_keyword, which overlaps over several of the other sets.
You cannot use non-disjoint set combinations in Antlr. It will cause ambiguity. This can be rectified several ways, but best to go with the yacc version because yacc requires disjoint sets, Antlr does not.
The postgresql grammar appears to have a mishmash of PlSQL embedded in PostgreSQL. This is wrong. If you want to combine the two grammars, it should be done in another way, and certainly not as part of the official PostgreSQL grammar.
I am removing PlSQL productions.
Consider input string
SELECT 'trailing' AS first;
in comments.sql. This is ambiguous becausefirst
has three different possible trees:This is caused because of the rule https://github.com/antlr/grammars-v4/blob/199a5121ece05d2f2e7eca330d0738220499e80c/sql/postgresql/PostgreSQLParser.g4#L4233-L4240
There is quite a bit of overlap across each of the alts.
Over in the original gram.y, the rule is https://github.com/postgres/postgres/blob/027124a872d7b5dfddc69590af42f626b1727dba/src/backend/parser/gram.y#L17560-L17565