Closed gb0o closed 4 years ago
That's a huge downgrade of performance...
Do you have a sample repro to be able to reproduce this performance issue ? I will look into it asap.
maybe it is command caching? https://www.bricelam.net/2017/07/20/sqlite-bulk-insert.html
Command caching and Prepare are implemented in the last version. From my last tests, it seems the problem is coming from the insert trigger...
Here is my setup:
USE [master]
GO
-- Server database
if (exists (select * from sys.databases where name = 'HeavyTables'))
begin
ALTER DATABASE HeavyTables SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
DROP DATABASE HeavyTables
End
Create database HeavyTables
Go
Use HeavyTables;
CREATE TABLE [dbo].[Customer](
[CustomerID] UniqueIdentifier NOT NULL,
[FirstName] [nvarchar](50) NOT NULL,
[MiddleName] [nvarchar](50) NULL,
[LastName] [nvarchar](50) NOT NULL,
[Suffix] [nvarchar](10) NULL,
[CompanyName] [nvarchar](128) NULL,
[SalesPerson] [nvarchar](256) NULL,
[EmailAddress] [nvarchar](50) NULL,
[Phone] [nvarchar](25) NULL
CONSTRAINT [PK_Customer_CustomerID] PRIMARY KEY CLUSTERED
(
[CustomerID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,
IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[Customer] ADD CONSTRAINT [Customer_CustomerId] DEFAULT (newid()) FOR [CustomerID]
insert into dbo.Customer (FirstName, LastName)
select top(15000) SUBSTRING(o1.name, 0, 50), SUBSTRING(o2.name, 0, 50)
from sys.all_objects as o1
cross join sys.all_objects as o2
cross join sys.all_objects as o3
The code I'm using
private static async Task SynchronizeHeavyTableAsync()
{
// Create 2 Sql Sync providers
var serverProvider = new SqlSyncProvider(DBHelper.GetDatabaseConnectionString("HeavyTables"));
var clientProvider = new SqliteSyncProvider("heavyOne.db");
var setup = new SyncSetup(new string[] { "Customer" });
var options = new SyncOptions();
options.BatchSize = 1000;
// Creating an agent that will handle all the process
var agent = new SyncAgent(clientProvider, serverProvider, options, setup);
var remoteProgress = new SynchronousProgress<ProgressArgs>(s =>
{
Console.ForegroundColor = ConsoleColor.Blue;
Console.WriteLine($"{s.Context.SyncStage}:\t{s.Message}");
Console.ResetColor();
});
agent.AddRemoteProgress(remoteProgress);
// Using the Progress pattern to handle progession during the synchronization
var progress = new SynchronousProgress<ProgressArgs>(s =>
{
Console.ForegroundColor = ConsoleColor.Green;
Console.WriteLine($"{s.Context.SyncStage}:\t{s.Message}");
Console.ResetColor();
});
do
{
// Console.Clear();
Console.WriteLine("Sync Start");
try
{
var r = await agent.SynchronizeAsync(progress);
// Write results
Console.WriteLine(r);
}
catch (Exception e)
{
Console.WriteLine(e.Message);
}
//Console.WriteLine("Sync Ended. Press a key to start again, or Escapte to end");
} while (Console.ReadKey().Key != ConsoleKey.Escape);
Console.WriteLine("End");
}
Here is the result with the triggers enabled:
And without triggers on the table:
(and It took me 10 sec to delete the triggers :) )
The problem here is that disabling triggers is not option, since it's not a sqlite feature...
So far, I'm wondering how it was working with the version 0.3
...
Still investigating, but if you have any ideas, please let me know
I have a sample repro which I can share soon. It pretty much shows the same results you have just demoed with your heavy sync example. It auto generates rows on the server with int and GUID PKs. Like you I also deleted triggers and noticed the same speed up you found. I implemented drop trigger commands in the OnProvisionedArgs interceptor.
Could the problem be in SqliteObjectNames.CreateUpdateCommandText? Previously this sql text was a simple insert. Now it doubles as both update and insert and also has two left joins.
Indeed
I've tried to replace the Upsert
command we have in 0.5.6
with the old Insert
from previous version 0.3.0
and the insert is pretty fast, even with the triggers
So far:
0.5.6
with triggers is slow0.5.6
without triggers is fast0.3.0
with triggers is fastHum... that's a challenge ...
From what I found, it seems the combo With (...)
in a statement, coupled with a trigger is slowing down the performances.
I've implemented another solution for the Upsert
command, without using the (SO MUCH APPRECIATED AND USEFUL) statement With()
..
I've created a new branch Sqlite-Performance and the statement building have been replaced in the file SQLiteObjectNames
The tests are currently in progress. Waiting the results, but I guess they should be good. Until then @gb0o, can you test the source code from the branch Sqlite-Performance and give feedbacks about the new performances ?
Perf with the new statement:
@Mimetis I have tested your branch and the results look much better. The time to sync rows with int and GUID PKs is now roughy the same and there is no sign of a progressive delay. Good work spotting the sqlite WITH
command as a likely culprit.
Nice !
I've merged the PR to the master branch and labelled the new version 0.5.7
Not sure I will deploy the packages on nuget, until september.
So for now, you should use the source code.
Thanks @gb0o for spotting this weird perf issue !!!
Hi again,
Thought I should round out this ticket with another chart. These excel charts really are painful to make but I wanted to see the improvement.
This time I compared version 0.3.0 vs 0.5.7 using both Int and GUID ID rows. The results show version 0.5.7 is now significantly faster at inserting data into sqlite.
The total time to sync this heavy data set (400K Int and 400K GUID rows) was also faster using version 0.5.7:
0.3.0: Total duration :0:16:51.59
0.5.7: Total duration :0:7:51.390
Breakdown of times for insertion of 400K rows in 1000K batches
0.3.0: GUID IDs: 0:07:20
0.3.0: Int IDs: 0:05:58
0.5.7: GUID IDs: 00:01:47
0.5.7: Int IDs: 00:01:46
Overall cf 0.3.0 vs 0.5.7
I think you have earned a free beer @Mimetis if I could send one through GitHub. Guy
Wow, that's an impressive chart !!
I'm ok for the beer @gb0o, @gentledepp, one day maybe ;)
Thanks for this great chart. Really Useful I wanted to make some perf comparison between 0.3 and 0.5.7 (mainly old version vs new version refactored started on 0.4), but never found the time for it. I'm really happy to see that all the efforts on the new version, with the help of the community, are worth it !!
Seb
Hey @Mimetis
I've been using Dotmim.Sync 0.3.0 in my project for a while and decided to test the latest version 0.5.6. I'm having a problem with slow performance for tables with a GUID PK syncing from SQL Server to Sqlite over HTTP. I made a little chart to compare Dotmim.Sync 0.3.0 vs 0.5.6.
The chart shows the older version completes ~355K rows in about 4 mins. However the new version (0.5.6) takes 3 hrs 24 mins for the same data getting progressively slower for each batch. I'm using a GUID PK in some of the tables in my application to avoid a sync conflict on the server. Both data sets were collected with Microsoft.Data.Sqlite 3.1.3.
If I do a heavy sync from a table with an int PK performance is good. Nothing to see there.
I create a complete sqlite db in one heavy sync and ship that inside a desktop installer. The app is used to rollout and maintain emergency radio equipment for firefighters in Victoria, Australia.
A complete sync normally takes around 10 minutes but the new version blows out to more than 6 hrs in total. The performance is just as bad if I use a snapshot because the bottle neck is inserting data into sqlite and not transfering the data across HTTP.
Btw great to see 0.5.6 is now much faster across HTTP.
Not sure if this is relevant but Microsoft.Data.Sqlite >= verison 3.0.0 now stores GUIDs as text and not binary. https://www.bricelam.net/2019/08/22/microsoft-data-sqlite-3-0.html
More than happy to retest performance if you make changes.
Regards, Guy