apache / shardingsphere

Distributed SQL transaction & query engine for data sharding, scaling, encryption, and more - on any database.
Apache License 2.0
19.85k stars 6.72k forks source link

Support parsing SQL Server SELECT DISTINCT sql #29156

Closed FlyingZC closed 8 months ago

FlyingZC commented 10 months ago

Background

Hi community. This issue is for #29149.

The ShardingSphere SQL parser engine helps users to parse SQL to create the AST (Abstract Syntax Tree) and visit the AST to get SQLStatement (Java Object). Currently, we are planning to enhance the support for SQL Server SQL parsing in ShardingSphere.

More details: https://shardingsphere.apache.org/document/current/en/reference/sharding/parse/

Task

This issue is to support more SQL Server sql parsing, as follows:

SELECT DISTINCT user.FirstName, user.LastName
INTO ms_user
FROM user INNER JOIN (
    SELECT * FROM ClickStream WHERE cs.url = 'www.microsoft.com'
    ) AS ms
ON user.user_ip = ms.user_ip

link

SELECT 
    req.session_id
    , req.total_elapsed_time AS duration_ms
    , req.cpu_time AS cpu_time_ms
    , req.total_elapsed_time - req.cpu_time AS wait_time
    , req.logical_reads
    , SUBSTRING (REPLACE (REPLACE (SUBSTRING (ST.text, (req.statement_start_offset/2) + 1, 
       ((CASE statement_end_offset
           WHEN -1
           THEN DATALENGTH(ST.text)  
           ELSE req.statement_end_offset
         END - req.statement_start_offset)/2) + 1) , CHAR(10), ' '), CHAR(13), ' '), 
      1, 512)  AS statement_text  
FROM sys.dm_exec_requests AS req
    CROSS APPLY sys.dm_exec_sql_text(req.sql_handle) AS ST
ORDER BY total_elapsed_time DESC

link

SELECT t.text,
     (qs.total_elapsed_time/1000) / qs.execution_count AS avg_elapsed_time,
     (qs.total_worker_time/1000) / qs.execution_count AS avg_cpu_time,
     ((qs.total_elapsed_time/1000) / qs.execution_count ) - ((qs.total_worker_time/1000) / qs.execution_count) AS avg_wait_time,
     qs.total_logical_reads / qs.execution_count AS avg_logical_reads,
     qs.total_logical_writes / qs.execution_count AS avg_writes,
     (qs.total_elapsed_time/1000) AS cumulative_elapsed_time_all_executions
FROM sys.dm_exec_query_stats qs
     CROSS apply sys.Dm_exec_sql_text (sql_handle) t
WHERE t.text like '<Your Query>%'
-- Replace <Your Query> with your query or the beginning part of your query. The special chars like '[','_','%','^' in the query should be escaped.
ORDER BY (qs.total_elapsed_time / qs.execution_count) DESC

link

SELECT t.text,
         qs.total_elapsed_time / qs.execution_count
         AS avg_elapsed_time,
         qs.total_worker_time / qs.execution_count
         AS avg_cpu_time,
         (qs.total_elapsed_time - qs.total_worker_time) / qs.execution_count
         AS avg_wait_time,
         qs.total_logical_reads / qs.execution_count
         AS avg_logical_reads,
         qs.total_logical_writes / qs.execution_count
         AS avg_writes,
         qs.total_elapsed_time
         AS cumulative_elapsed_time
FROM sys.dm_exec_query_stats qs
         CROSS apply sys.Dm_exec_sql_text (sql_handle) t
WHERE (qs.total_elapsed_time - qs.total_worker_time) / qs.total_elapsed_time
         > 0.2
ORDER BY qs.total_elapsed_time / qs.execution_count DESC

link

SELECT 'DECLARE @serverName NVARCHAR(512) = N''' + value + ''''
FROM sys.dm_hadr_fabric_config_parameters
WHERE parameter_name = 'DnsRecordName'

UNION

SELECT 'DECLARE @node NVARCHAR(512) = N''' + NodeName + '.' + Cluster + ''''
FROM (
    SELECT SUBSTRING(replica_address, 0, CHARINDEX('\', replica_address)) AS NodeName,
        RIGHT(service_name, CHARINDEX('/', REVERSE(service_name)) - 1) AppName,
        JoinCol = 1
    FROM sys.dm_hadr_fabric_partitions fp
    INNER JOIN sys.dm_hadr_fabric_replicas fr
        ON fp.partition_id = fr.partition_id
    INNER JOIN sys.dm_hadr_fabric_nodes fn
        ON fr.node_name = fn.node_name
    WHERE service_name LIKE '%ManagedServer%'
        AND replica_role = 2
) t1
LEFT JOIN (
    SELECT value AS Cluster,
        JoinCol = 1
    FROM sys.dm_hadr_fabric_config_parameters
    WHERE parameter_name = 'ClusterName'
    ) t2
    ON (t1.JoinCol = t2.JoinCol)
INNER JOIN (
    SELECT [value] AS AppName
    FROM sys.dm_hadr_fabric_config_parameters
    WHERE section_name = 'SQL'
        AND parameter_name = 'InstanceName'
    ) t3
    ON (t1.AppName = t3.AppName)

UNION

SELECT 'DECLARE @port NVARCHAR(512) = N''' + value + ''''
FROM sys.dm_hadr_fabric_config_parameters
WHERE parameter_name = 'HadrPort'

link

Process

  1. First confirm that this is a correct SQL Server sql syntax, if not please leave a message under the issue and ignore it;
  2. Compare SQL definitions in Official SQL Doc and ShardingSphere SQL Doc;
  3. If there is any difference in ShardingSphere SQL Doc, please correct them by referring to the Official SQL Doc;
  4. Run mvn install the current_file_module;
  5. Check whether there are any exceptions. If indeed, please fix them. (Especially xxxVisitor.class);
  6. Add new corresponding SQL case in SQL Cases and expected parsed result in Expected Statement XML;
  7. Run SQLParserParameterizedTest to make sure no exceptions.

Relevant Skills

  1. Master JAVA language
  2. Have a basic understanding of Antlr g4 file
  3. Be familiar with SQL Server SQLs
github-actions[bot] commented 9 months ago

There hasn't been any activity on this issue recently, and in order to prioritize active issues, it will be marked as stale.

TherChenYang commented 8 months ago

@FlyingZC This issue can be assigned to me, thank you very much.

strongduanmu commented 8 months ago

@TherChenYang Assigned. Thank you so much for your continued contributions.

TherChenYang commented 8 months ago

@TherChenYang Assigned. Thank you so much for your continued contributions.

Thank you very much, PR has been submitted.