added optional parameters @update_existing and @max_rows_per_batch

EitanBlumin commented 1 year ago

The addition of the optional parameter @max_rows_per_batch fixes issues #19 and #11. It splits the MERGE command into multiple batches, each batch merges a maximum number of rows as specified.

Additionally, this PR adds the optional parameter @update_existing which toggles the use of the UPDATE clause in the MERGE command. This is different from the @update_only_if_changed parameter which only toggles whether there would be a check for changed data before updating a row. But the UPDATE clause is still added. The parameter @update_existing, when set to 0, will remove the UPDATE clause entirely.

 @update_existing bit = 1, -- When 1, performs an UPDATE operation on existing rows.
 @max_rows_per_batch int = NULL -- When not NULL, splits the MERGE command into multiple batches, each batch merges X rows as specified

dnlnln commented 1 year ago

This is a really neat solution, I'm just getting around to absorbing it properly.

I am facing one issue though: whenever I generate a merge, I'm getting an extra leading comma on each row of values, e.g. with this command:

EXEC [AdventureWorks].[dbo].[sp_generate_merge] 'ContactType', @schema = 'Person', @max_rows_per_batch = 4, @delete_if_not_matched = 0, @nologo = 1, @include_rowsaffected = 0

I get this:

USE [AdventureWorksFlyway]
GO

SET IDENTITY_INSERT [Person].[ContactType] ON

MERGE INTO [Person].[ContactType] AS [Target]
USING (VALUES
   (1,N'Accounting Manager','2008-04-30T00:00:00')
, ,(2,N'Assistant Sales Agent','2008-04-30T00:00:00')
, ,(3,N'Assistant Sales Representative','2008-04-30T00:00:00')
, ,(4,N'Coordinator Foreign Markets','2008-04-30T00:00:00')
) AS [Source] ([ContactTypeID],[Name],[ModifiedDate])
ON ([Target].[ContactTypeID] = [Source].[ContactTypeID])
WHEN MATCHED AND (
    NULLIF([Source].[Name], [Target].[Name]) IS NOT NULL OR NULLIF([Target].[Name], [Source].[Name]) IS NOT NULL OR 
    NULLIF([Source].[ModifiedDate], [Target].[ModifiedDate]) IS NOT NULL OR NULLIF([Target].[ModifiedDate], [Source].[ModifiedDate]) IS NOT NULL) THEN
 UPDATE SET
  [Target].[Name] = [Source].[Name], 
  [Target].[ModifiedDate] = [Source].[ModifiedDate]
WHEN NOT MATCHED BY TARGET THEN
 INSERT([ContactTypeID],[Name],[ModifiedDate])
 VALUES([Source].[ContactTypeID],[Source].[Name],[Source].[ModifiedDate]);

MERGE INTO [Person].[ContactType] AS [Target]
USING (VALUES
  ,(5,N'Export Administrator','2008-04-30T00:00:00')
, ,(6,N'International Marketing Manager','2008-04-30T00:00:00')
, ,(7,N'Marketing Assistant','2008-04-30T00:00:00')
, ,(8,N'Marketing Manager','2008-04-30T00:00:00')
) AS [Source] ([ContactTypeID],[Name],[ModifiedDate])
ON ([Target].[ContactTypeID] = [Source].[ContactTypeID])
WHEN MATCHED AND (
    NULLIF([Source].[Name], [Target].[Name]) IS NOT NULL OR NULLIF([Target].[Name], [Source].[Name]) IS NOT NULL OR 
    NULLIF([Source].[ModifiedDate], [Target].[ModifiedDate]) IS NOT NULL OR NULLIF([Target].[ModifiedDate], [Source].[ModifiedDate]) IS NOT NULL) THEN
 UPDATE SET
  [Target].[Name] = [Source].[Name], 
  [Target].[ModifiedDate] = [Source].[ModifiedDate]
WHEN NOT MATCHED BY TARGET THEN
 INSERT([ContactTypeID],[Name],[ModifiedDate])
 VALUES([Source].[ContactTypeID],[Source].[Name],[Source].[ModifiedDate]);

-- 

SET IDENTITY_INSERT [Person].[ContactType] OFF

Any ideas what might be causing that for me?

EitanBlumin commented 1 year ago

Thank you for the catch, @dnlnln . I found the cause, it was due to this code section:

        SET @CurrentValuesList += CAST((SELECT @b + CASE WHEN ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) = 1 THEN ' ' ELSE ',' END + val
                                    FROM @tab
                                    WHERE ID BETWEEN @ValuesListIDFrom AND @ValuesListIDTo
                                    ORDER BY ID FOR XML PATH('')) AS XML).value('.', 'NVARCHAR(MAX)');

The mistake is in the CASE WHEN expression that concatenates a comma at the start of the row, but the values in @tab already have such commas so it's not necessary.

However, it's not actually a mistake because if I only rely on the commas in @tab then I'll have incorrectly placed commas when splitting the values into chunks.

Therefore, the correct fix, intead, would be to remove the commas inserted into @tab at this code section:

SET @Actual_Values = 
 'SELECT ' + 
 CASE WHEN @top IS NULL OR @top < 0 THEN '' ELSE ' TOP ' + LTRIM(STR(@top)) + ' ' END + 
 '''' + 
 ' '' + CASE WHEN ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) = 1 THEN '' '' ELSE '','' END + ''(''+ ' + @Actual_Values + '+'')''' + ' ' + 
 COALESCE(@from,' FROM ' + @Source_Table_Qualified + ' (NOLOCK) ORDER BY ' + @PK_column_list)

I committed the fix. It should work okay now.

I apologize for the trouble :)

dnlnln commented 1 year ago

Thanks for that @EitanBlumin, looking pretty good now. A couple of additional thoughts:

To show off the new feature, it would be nice to add an example calling syntax to the header
Given that the new param doesn't have any effect when @include_values=0, it might be a good idea to add a validation for this invalid combination, similar to the other validation you added for the delete clause Finally just a heads up that there is currently a merge conflict in this PR due to the merge of #93 . Would you be able to merge master into this branch to resolve? Thanks

EitanBlumin commented 1 year ago

Thanks, @dnlnln .

I implemented your suggestions.

dnlnln commented 1 year ago

Thanks @EitanBlumin. Just fyi I've made a few minor tweaks in master just to make the output formatting consistent.

dnlnln / generate-sql-merge

added optional parameters @update_existing and @max_rows_per_batch #94