Open gaoshihang opened 11 months ago
Could you write the entire input line that is parse for the Snowflake query you are describing in these trees?
Your pictures are chopped. I can't reverse engineer the query. The second picture contains non-terminals such as "booleanExpression" and "valueExpression". This has nothing to do with SnowflakeParser.g4, and there is no "Spark" grammar anywhere in this repo. Please use "permalinks" to what you are referring to.
In any case, what I think you want is to add a labeledAlt syntax to certain rules.
Yes. I can give a example to describe the thing. But don't know if my understand is right.
Suppose there is a simple query in Snowflake:
select coalesce(A:B, -1) from table_example
With https://github.com/antlr/grammars-v4/blob/master/sql/snowflake/SnowflakeParser.g4 The tree is like this:
Now if I want to get the column reference(A:B) in this query, I can't use enterExpr method, because there also a '-1' is related to expr node.
But if the query in SparkSQL:
select coalesce(A.B, -1) from table_example
With https://github.com/apache/spark/blob/master/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4 The tree is like this:
So, I can convenient to get the column reference(A.B) using enterDereference method.
So, I'm just thinking, if we can distinguish the column reference and number/constant, we can more convenient to get the information we want.
I think you are trying to find the "expr[COLON]" nodes in the tree. For example,
$ trparse -i 'select coalesce(A:B, -1) from table_example' | trxgrep ' //function_call/expr_list/expr[COLON]' | trtree
CSharp 0 string success 0.3512093
( expr
( expr
( primitive_expression
( id_
( ID
( text:'A' tt:0 chnl:DEFAULT_TOKEN_CHANNEL
) ) ) ) )
( COLON
( text:':' tt:0 chnl:DEFAULT_TOKEN_CHANNEL
) )
( expr
( primitive_expression
( id_
( ID
( text:'B' tt:0 chnl:DEFAULT_TOKEN_CHANNEL
) ) ) ) ) )
11/15-20:32:42 ~/issues/current-grammars/sql/snowflake/Generated-CSharp
$ trparse -i 'select coalesce(A:B, -1) from table_example' | trxgrep ' //function_call/expr_list/expr[COLON]' | trtext
CSharp 0 string success 0.3496113
A:B
11/15-20:33:04 ~/issues/current-grammars/sql/snowflake/Generated-CSharp
$
Yes, you could use a visitor to implement the XPath query, but it would involve a lot of navigating around the tree, checking for the existence of "COLON" as a child of "expr", and verifying that the "expr" is a descendant of a function_call. (This is why I never use Antlr visitors and listeners. It is far too primitive to get anything useful done.)
You can add labels for all the alts for "expr" as an alternative.
Thanks @kaby76 , I will try this. By the way, are there any tools you recommend besides Antlr?
Thanks @kaby76 , I will try this. By the way, are there any tools you recommend besides Antlr?
There are other parser generators but parser generators are a dime a dozen. But, there is nothing like Antlr4, the grammars-v4 repo, the Trash toolkit, the VSCode extension, and other tools that support Antlr4 parsers.
Got it. Think i can try the Trash toolkit, learn how to use it, thanks!
Now in Snowflake g4 file, column reference and number/constant are all expr(primitive_expression) node. So it's hard to traverse the tree and get the column reference we need.
For example, In Spark, column reference is a node called columnReference, and number is a node called constantDefault