antlr / grammars-v4

Grammars written for ANTLR v4; expectation that the grammars are free of actions.
MIT License
10.24k stars 3.72k forks source link

[postgresql] expressions with '=' operator ambiguous. #4332

Open kaby76 opened 4 days ago

kaby76 commented 4 days ago

Consider input select * from tmptz where f1 at time zone 'utc' = '2017-01-18 00:00'; from examples/timestamptz.sql. The parse is ambiguous.

11/18-06:10:41 ~/issues/g4-current/sql/postgresql/Generated-CSharp
$ trparse --ambig -i "select * from tmptz where f1 at time zone 'utc' = '2017-01-18 00:00';" | trtree -a
CSharp 0 string success 0.1539677
(root (stmtblock (stmtmulti (stmt (selectstmt (select_no_parens (select_clause (simple_select_intersect (simple_select_pramary (SELECT "select") (target_list_ (target_list (target_el (STAR "*")))) (from_clause (FROM "from") (from_list (table_ref (relation_expr (qualified_name (colid (identifier (Identifier "tmptz")))))))) (where_clause (WHERE "where") (a_expr (a_expr_qual (a_expr_lessless (a_expr_or (a_expr_and (a_expr_between (a_expr_in (a_expr_unary_not (a_expr_isnull (a_expr_is_not (a_expr_compare (a_expr_like (a_expr_qual_op (a_expr_unary_qualop (a_expr_add (a_expr_mul (a_expr_caret (a_expr_unary_sign (a_expr_at_time_zone (a_expr_collate (a_expr_typecast (c_expr (columnref (colid (identifier (Identifier "f1"))))))) (AT "at") (TIME "time") (ZONE "zone") (a_expr (a_expr_qual (a_expr_lessless (a_expr_or (a_expr_and (a_expr_between (a_expr_in (a_expr_unary_not (a_expr_isnull (a_expr_is_not (a_expr_compare (a_expr_like (a_expr_qual_op (a_expr_unary_qualop (a_expr_add (a_expr_mul (a_expr_caret (a_expr_unary_sign (a_expr_at_time_zone (a_expr_collate (a_expr_typecast (c_expr (aexprconst (sconst (anysconst (StringConstant "'utc'"))))))))))))))) (EQUAL "=") (a_expr_like (a_expr_qual_op (a_expr_unary_qualop (a_expr_add (a_expr_mul (a_expr_caret (a_expr_unary_sign (a_expr_at_time_zone (a_expr_collate (a_expr_typecast (c_expr (aexprconst (sconst (anysconst (StringConstant "'2017-01-18 00:00'")))))))))))))))))))))))))))))))))))))))))))))))))))) (SEMI ";"))) (EOF ""))
(root (stmtblock (stmtmulti (stmt (selectstmt (select_no_parens (select_clause (simple_select_intersect (simple_select_pramary (SELECT "select") (target_list_ (target_list (target_el (STAR "*")))) (from_clause (FROM "from") (from_list (table_ref (relation_expr (qualified_name (colid (identifier (Identifier "tmptz")))))))) (where_clause (WHERE "where") (a_expr (a_expr_qual (a_expr_lessless (a_expr_or (a_expr_and (a_expr_between (a_expr_in (a_expr_unary_not (a_expr_isnull (a_expr_is_not (a_expr_compare (a_expr_like (a_expr_qual_op (a_expr_unary_qualop (a_expr_add (a_expr_mul (a_expr_caret (a_expr_unary_sign (a_expr_at_time_zone (a_expr_collate (a_expr_typecast (c_expr (columnref (colid (identifier (Identifier "f1"))))))) (AT "at") (TIME "time") (ZONE "zone") (a_expr (a_expr_qual (a_expr_lessless (a_expr_or (a_expr_and (a_expr_between (a_expr_in (a_expr_unary_not (a_expr_isnull (a_expr_is_not (a_expr_compare (a_expr_like (a_expr_qual_op (a_expr_unary_qualop (a_expr_add (a_expr_mul (a_expr_caret (a_expr_unary_sign (a_expr_at_time_zone (a_expr_collate (a_expr_typecast (c_expr (aexprconst (sconst (anysconst (StringConstant "'utc'")))))))))))))))))))))))))))))))))) (EQUAL "=") (a_expr_like (a_expr_qual_op (a_expr_unary_qualop (a_expr_add (a_expr_mul (a_expr_caret (a_expr_unary_sign (a_expr_at_time_zone (a_expr_collate (a_expr_typecast (c_expr (aexprconst (sconst (anysconst (StringConstant "'2017-01-18 00:00'"))))))))))))))))))))))))))))))))) (SEMI ";"))) (EOF ""))

The problem is in a_expr_compare: https://github.com/antlr/grammars-v4/blob/97bd28855acf37204623a71eb9d5ce3f9f62f80a/sql/postgresql/PostgreSQLParser.g4#L3618-L3623

kaby76 commented 3 days ago

There are several problems here with the rules that implement operator precedence. It was not implemented correctly for a_expr_compare and a_expr_at_time_zone.https://github.com/antlr/grammars-v4/blob/97bd28855acf37204623a71eb9d5ce3f9f62f80a/sql/postgresql/PostgreSQLParser.g4#L3669-L3671

For some reason, the person who implemented this seemed to think you can put any nonterminal after the operator. This is not true.