Aggregate Queries Reconciliation (#740). This release introduces several changes to enhance the functionality of the project, including the implementation of Aggregate Queries Reconciliation, addressing issue #503. A new property, aggregates, has been added to the base class of the query builder module to support aggregate queries reconciliation. A generate_final_reconcile_aggregate_output function has been added to generate the final reconcile output for aggregate queries. A new SQL file creates a table called aggregate_details to store details about aggregate reconciles, and a new column, operation_name, has been added to the main table in the installation reconciliation query. Additionally, new classes and methods have been introduced for handling aggregate queries and their reconciliation, and new SQL tables and columns have been created for storing and managing rules for aggregating data in the context of query reconciliation. Unit tests have been added to ensure the proper functioning of aggregate queries reconciliation and reconcile aggregate data in the context of missing records.
Generate GROUP BY / PIVOT (#747). The LogicalPlanGenerator class in the remorph library has been updated to support generating GROUP BY and PIVOT clauses for SQL queries. A new private method, "aggregate", has been added to handle two types of aggregates: GroupBy and Pivot. For GroupBy, it generates a GROUP BY clause with specified grouping expressions. For Pivot, it generates a PIVOT clause where the specified column is used as the pivot column and the specified values are used as the pivot values, compatible with Spark SQL. If the aggregate type is unsupported, a TranspileException is thrown. Additionally, new test cases have been introduced for the LogicalPlanGenerator class in the com.databricks.labs.remorph.generators.sql package to support testing the transpilation of Aggregate expressions with GROUP BY and PIVOT clauses, ensuring proper handling and transpilation of these expressions.
Implement error strategy for Snowflake parsing and use error strategy for all parser instances (#760). In this release, we have developed an error strategy specifically for Snowflake parsing that translates raw token names and parser rules into more user-friendly SQL error messages. This strategy is applied consistently across all parser instances, ensuring a unified error handling experience. Additionally, we have refined the DBL_DOLLAR rule in the SnowflakeLexer grammar to handle escaped dollar signs correctly. These updates improve the accuracy and readability of error messages for SQL authors, regardless of the parsing tool or transpiler used. Furthermore, we have updated the TSQL parsing error strategy to match the new Snowflake error strategy implementation, providing a consistent error handling experience across dialects.
Incremental improvement to error messages - article selection (#711). In this release, we have implemented an incremental improvement to the error messages generated during T-SQL code parsing. This change introduces a new private method, articleFor, which determines whether to use a or an in the generated messages based on the first letter of the following word. The generateMessage method has been updated to use this new method when constructing the initial error message and subsequent messages when there are multiple expected tokens. This improvement ensures consistent use of articles a or an in the error messages, enhancing their readability for software engineers working with T-SQL code.
TSQL: Adds tests and support for SELECT OPTION(...) generation (#755). In this release, we have added support for generating code for the TSQL SELECT ... OPTION(...) clause in the codebase. This new feature includes the ability to transpile any query hints supplied with a SELECT statement as comments in the output code, allowing for easier assessment of query performance after transpilation. The OPTION clause is now generated as comments, including MAXRECURSION, string options, boolean options, and auto options. Additionally, we have added new tests and updated the TSqlAstBuilderSpec test class with new and updated test cases to cover the new functionality. The implementation is focused on generating code for the OPTION clause, and does not affect the actual execution of the query. The changes are limited to the ExpressionGenerator class and its associated methods, and the TSqlRelationBuilder class, without affecting other parts of the codebase.
TSQL: IR implementation of MERGE (#719). The open-source library has been updated to include a complete implementation of the TSQL MERGE statement's IR (Intermediate Representation), bringing it in line with Spark SQL. The LogicalPlanGenerator class now includes a generateMerge method, which generates the SQL code for the MERGE statement, taking a MergeIntoTable object containing the target and source tables, merge condition, and merge actions as input. The MergeIntoTable class has been added as a case class to represent the logical plan of the MERGE INTO command and extends the Modification trait. The LogicalPlanGenerator class also includes a new generateWithOptions method, which generates SQL code for the WITH OPTIONS clause, taking a WithOptions object containing the input and options as children. Additionally, the TSqlRelationBuilder class has been updated to handle the MERGE statement's parsing, introducing new methods and updating existing ones, such as visitMerge. The TSqlToDatabricksTranspiler class has been updated to include support for the TSQL MERGE statement, and the ExpressionGenerator class has new tests for options, columns, and arithmetic expressions. A new optimization rule, TrapInsertDefaultsAction, has been added to handle the behavior of the DEFAULT keyword during INSERT statements. The commit also includes test cases for the MergeIntoTable logical operator and the T-SQL merge statement in the TSqlAstBuilderSpec.
aggregates
, has been added to the base class of the query builder module to support aggregate queries reconciliation. Agenerate_final_reconcile_aggregate_output
function has been added to generate the final reconcile output for aggregate queries. A new SQL file creates a table calledaggregate_details
to store details about aggregate reconciles, and a new column,operation_name
, has been added to themain
table in theinstallation
reconciliation query. Additionally, new classes and methods have been introduced for handling aggregate queries and their reconciliation, and new SQL tables and columns have been created for storing and managing rules for aggregating data in the context of query reconciliation. Unit tests have been added to ensure the proper functioning of aggregate queries reconciliation and reconcile aggregate data in the context of missing records.articleFor
, which determines whether to usea
oran
in the generated messages based on the first letter of the following word. ThegenerateMessage
method has been updated to use this new method when constructing the initial error message and subsequent messages when there are multiple expected tokens. This improvement ensures consistent use of articlesa
oran
in the error messages, enhancing their readability for software engineers working with T-SQL code.SELECT ... OPTION(...)
clause in the codebase. This new feature includes the ability to transpile any query hints supplied with a SELECT statement as comments in the output code, allowing for easier assessment of query performance after transpilation. The OPTION clause is now generated as comments, including MAXRECURSION, string options, boolean options, and auto options. Additionally, we have added new tests and updated the TSqlAstBuilderSpec test class with new and updated test cases to cover the new functionality. The implementation is focused on generating code for the OPTION clause, and does not affect the actual execution of the query. The changes are limited to the ExpressionGenerator class and its associated methods, and the TSqlRelationBuilder class, without affecting other parts of the codebase.LogicalPlanGenerator
class now includes agenerateMerge
method, which generates the SQL code for the MERGE statement, taking aMergeIntoTable
object containing the target and source tables, merge condition, and merge actions as input. TheMergeIntoTable
class has been added as a case class to represent the logical plan of the MERGE INTO command and extends theModification
trait. TheLogicalPlanGenerator
class also includes a newgenerateWithOptions
method, which generates SQL code for the WITH OPTIONS clause, taking aWithOptions
object containing the input and options as children. Additionally, theTSqlRelationBuilder
class has been updated to handle the MERGE statement's parsing, introducing new methods and updating existing ones, such asvisitMerge
. TheTSqlToDatabricksTranspiler
class has been updated to include support for the TSQL MERGE statement, and theExpressionGenerator
class has new tests for options, columns, and arithmetic expressions. A new optimization rule,TrapInsertDefaultsAction
, has been added to handle the behavior of the DEFAULT keyword during INSERT statements. The commit also includes test cases for theMergeIntoTable
logical operator and the T-SQL merge statement in theTSqlAstBuilderSpec
.