apache / shardingsphere

Distributed SQL transaction & query engine for data sharding, scaling, encryption, and more - on any database.
Apache License 2.0
19.88k stars 6.73k forks source link

Improve the parsing of methods in MySQL #31567

Closed TherChenYang closed 4 months ago

TherChenYang commented 4 months ago

Background

Hi community.\ The ShardingSphere SQL parser engine helps users to parse SQL to create the AST (Abstract Syntax Tree) and visit the AST to get SQLStatement (Java Object).

Currently, we are planning to enhance the support for MySQL SQL parsing in ShardingSphere.

More details: https://shardingsphere.apache.org/document/current/en/reference/sharding/parse/

Issue Background Explanation

In the original parsing work, it may have overlooked the parsing of method parameters. For ShardingSphere, we need to pay attention to the table name or field name in method parameters. If there are issues with parsing method parameters, it will cause problems in subsequent binding and rewriting tasks.

For the verification work, we need to complete the following items.

  1. Find the example SQL of this method in the official website.Built-In Function and Operator Reference
  2. verify if the SQL itself can be parsed by the parser
  3. Check if the parsed SQLStatement can correctly capture the parameters in the method.

Task

UCASE()
UNCOMPRESS()

Overall Procedure

If you intend to participate in fixing this issue, please feel free to leave a comment below the issue. Community members will assign the issue accordingly.

For example, you can leave a comment like this: "Hi, please assign this issue to me. Thank you!"

Once you have claimed the issue, please review the syntax of the SQL on the official website of the corresponding database. Execute the SQL on the respective database to ensure the correctness of the SQL syntax.

You can check the corresponding source of each SQL case on the official database website by clicking on the link provided below each case.

Next, execute the problematic SQL cases mentioned above in the database (you can quickly start the corresponding database using the Docker image for that database, and then connect to it using a client you are familiar with), to ensure that the SQL syntax itself is correct.

Fixing ANTLR Grammar Parsing Issue

Once you have confirmed the correctness of the SQL syntax, you can validate and fix the grammar parsing issue in ShardingSphere.

If you are using IntelliJ IDEA, you will need to install the ANTLR plugin before proceeding.

If it is an ANTLR parsing error message, try to repair the .g4 file by comparing it with the official database syntax until the SQL can be correctly parsed by ANTLR.

When there is no error message in the ANTLR Preview window, it means that ANTLR can correctly parse the SQL.

Visitor problem fix

After ANTLR parses SQL into an abstract syntax tree, ShardingSphere will access the abstract syntax tree through Visitor and extract the required information. If you need to extract Segments, you need to first execute:

 mvn -T 2C clean install -DskipTests

Under the shardingsphere-parser module to compile the entire parser module.\ Then rewrite the corresponding visit method in SQLStatementVisitorr as needed to extract the corresponding Segment.

Add assertion test file

After the above SQL parsing problem is repaired, the corresponding Test needs to be added. The steps are as follows:

  1. Add the corresponding sql-case in the sql/supported directory.
  2. Add case assertions in the case directory of the shardingsphere-test-it-parser module.
  3. Run org.apache.shardingsphere.test.it.sql.parser.internal.InternalSQLParserIT\ After SQL Parser IT runs successfully, you can submit a PR.

Relevant Skills

  1. Master JAVA language
  2. Have a basic understanding of Antlr g4 file
  3. Be familiar with Doris SQLs
Kiritsgu commented 4 months ago

@TherChenYang Excuse me, could you please provide more specific details? I'm having trouble understanding the exact issue that needs to be fixed. I've tested several examples of these tasks, and they all can be parsed by the parser. I even tried replacing parameters like '@str' with 'test.column1', and that worked as well. I would greatly appreciate it if you could clarify my concerns.

image
TherChenYang commented 4 months ago

@Kiritsgu

Kiritsgu commented 4 months ago

@TherChenYang Thank you! Your explanation has been very helpful. I'll look into this and see if I can resolve the issue.

Kiritsgu commented 4 months ago

@TherChenYang Hello, I have reviewed some of the functions mentioned above, such as UPPER(), UCASE(), and UNCOMPRESS(), and found that their corresponding SQLStatements can accurately capture the parameters within the methods, as illustrated below. image

I also wrote a simple test case for the SQL query "SELECT UPPER(t.Fname) FROM Ftest t," which passed successfully. If I have misunderstood anything, please feel free to correct me. <select sql-case-id="select_upper_function_with_alias"> <projections start-index="7" stop-index="20"> <expression-projection text="UPPER(t.Fname)" start-index="7" stop-index="20"> <expr> <function start-index="7" stop-index="20" text="UPPER(t.Fname)" function-name="UPPER"> <parameter> <column start-index="13" stop-index="19" name="Fname" > <owner name="t" start-index="13" stop-index="13" /> </column> </parameter> </function> </expr> </expression-projection> </projections> <from> <simple-table name="Ftest" alias="t" start-index="27" stop-index="33" /> </from> </select>

Given that the UPPER() function works well, I assume that most other functions should also operate correctly since they all belong to the RegularFunction category. If any changes are needed, they would likely involve a general fix rather than targeting a specific function. That being said, it is puzzling why there are so many similar issues.

As for the next steps, could you please guide me? Should I start by identifying functions whose corresponding SQLStatements cannot capture the parameters accurately?

TherChenYang commented 4 months ago

@Kiritsgu Thank you very much for your verification. For most functions, the parsing engine can indeed resolve them correctly. However, we are unable to determine which specific functions we cannot parse correctly. If we rely on feedback from the community users, our work will have strong post-event characteristics. Therefore, what we can do is check all functions in advance to see if there are any parsing errors. If these functions can be resolved normally, then I can close this issue.

Kiritsgu commented 4 months ago

@TherChenYang Alright, please assign this issue to me. I will thoroughly examine all the functions mentioned. Should I also submit the test cases along with my findings?

TherChenYang commented 4 months ago

@Kiritsgu Thank you very much. I will assign it to you if there are any changes in the corresponding g4 file or visitor code, and then you need to submit the corresponding test case.

Kiritsgu commented 4 months ago

@TherChenYang I have reviewed all the functions mentioned in this issue and confirmed that each one meets the requirement of accurately mapping table field parameters in methods to parameters in the Statement. Therefore, I believe we can consider this issue resolved and close it.

TherChenYang commented 4 months ago

@Kiritsgu Thanks for your verification, I will close this issue

KonarzewskiP commented 3 months ago

Hi @Kiritsgu

I have a problem with ANTLR Preview plugin and hope you could help me here.

I build my project as suggested with command: mvn -T 2C clean install -DskipTests

However when I want to check g4 grammar with your SQL (SELECT LOWER(test.column), LOWER(CONVERT(@ str USING utf8mb4))) I have the following error: image

Here is path to DMLStatement.g4 file - parser/sql/dialect/mysql/src/main/antlr4/imports/mysql/DMLStatement.g4

Could you give me please some suggestions how can I make my ANTLR plugin work?

Kiritsgu commented 3 months ago

@KonarzewskiP This is because you didn't select the specific rule you want to test. Please try the following:

image
KonarzewskiP commented 3 months ago

@Kiritsgu

Amazing!

Thank you very much for your help and have a good day!