antlr / grammars-v4

Grammars written for ANTLR v4; expectation that the grammars are free of actions.
MIT License
10.15k stars 3.7k forks source link

[Snowflake g4] Can we distinguish the column reference and number/constant ? #3820

Open gaoshihang opened 11 months ago

gaoshihang commented 11 months ago

Now in Snowflake g4 file, column reference and number/constant are all expr(primitive_expression) node. So it's hard to traverse the tree and get the column reference we need.

image

For example, In Spark, column reference is a node called columnReference, and number is a node called constantDefault image

kaby76 commented 11 months ago

Could you write the entire input line that is parse for the Snowflake query you are describing in these trees?

Your pictures are chopped. I can't reverse engineer the query. The second picture contains non-terminals such as "booleanExpression" and "valueExpression". This has nothing to do with SnowflakeParser.g4, and there is no "Spark" grammar anywhere in this repo. Please use "permalinks" to what you are referring to.

In any case, what I think you want is to add a labeledAlt syntax to certain rules.

gaoshihang commented 11 months ago

Yes. I can give a example to describe the thing. But don't know if my understand is right.

Suppose there is a simple query in Snowflake: select coalesce(A:B, -1) from table_example

With https://github.com/antlr/grammars-v4/blob/master/sql/snowflake/SnowflakeParser.g4 The tree is like this:

image

Now if I want to get the column reference(A:B) in this query, I can't use enterExpr method, because there also a '-1' is related to expr node.

But if the query in SparkSQL: select coalesce(A.B, -1) from table_example

With https://github.com/apache/spark/blob/master/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4 The tree is like this:

image

So, I can convenient to get the column reference(A.B) using enterDereference method.

So, I'm just thinking, if we can distinguish the column reference and number/constant, we can more convenient to get the information we want.

kaby76 commented 11 months ago

I think you are trying to find the "expr[COLON]" nodes in the tree. For example,

$ trparse -i 'select coalesce(A:B, -1) from table_example' | trxgrep ' //function_call/expr_list/expr[COLON]' | trtree
CSharp 0 string success 0.3512093

( expr
  ( expr
    ( primitive_expression
      ( id_
        ( ID
          (  text:'A' tt:0 chnl:DEFAULT_TOKEN_CHANNEL
  ) ) ) ) )
  ( COLON
    (  text:':' tt:0 chnl:DEFAULT_TOKEN_CHANNEL
  ) )
  ( expr
    ( primitive_expression
      ( id_
        ( ID
          (  text:'B' tt:0 chnl:DEFAULT_TOKEN_CHANNEL
) ) ) ) ) )

11/15-20:32:42 ~/issues/current-grammars/sql/snowflake/Generated-CSharp
$ trparse -i 'select coalesce(A:B, -1) from table_example' | trxgrep ' //function_call/expr_list/expr[COLON]' | trtext
CSharp 0 string success 0.3496113
A:B
11/15-20:33:04 ~/issues/current-grammars/sql/snowflake/Generated-CSharp
$

Yes, you could use a visitor to implement the XPath query, but it would involve a lot of navigating around the tree, checking for the existence of "COLON" as a child of "expr", and verifying that the "expr" is a descendant of a function_call. (This is why I never use Antlr visitors and listeners. It is far too primitive to get anything useful done.)

You can add labels for all the alts for "expr" as an alternative.

gaoshihang commented 11 months ago

Thanks @kaby76 , I will try this. By the way, are there any tools you recommend besides Antlr?

kaby76 commented 11 months ago

Thanks @kaby76 , I will try this. By the way, are there any tools you recommend besides Antlr?

There are other parser generators but parser generators are a dime a dozen. But, there is nothing like Antlr4, the grammars-v4 repo, the Trash toolkit, the VSCode extension, and other tools that support Antlr4 parsers.

gaoshihang commented 11 months ago

Got it. Think i can try the Trash toolkit, learn how to use it, thanks!