Mimetis / Dotmim.Sync

A brand new database synchronization framework, multi platform, multi databases, developed on top of .Net Standard 2.0. https://dotmimsync.readthedocs.io/
MIT License
899 stars 194 forks source link

Schema Migration Process Takes Too Long and Fails Due to Timeout When Multiple Clients Perform Updates #1255

Open nicolazreinh opened 1 month ago

nicolazreinh commented 1 month ago

Description: The schema migration process for synchronizing changes between the client and server is not only slow but also fails when multiple clients perform the update simultaneously. The process takes 10 to 20 minutes per client, and some clients have multiple scopes, further increasing the time required.

When two or more clients attempt to update the schema at the same time, it often leads to timeouts due to limited database resources, resulting in the following error:

Microsoft.Data.SqlClient.SqlException (0x80131904): Execution Timeout Expired. The timeout period elapsed prior to completion of the operation or the server is not responding.
 ---> System.ComponentModel.Win32Exception (258): The wait operation timed out.

Here is the process being followed:

  1. Apply the update script on the client side.
  2. Remove all metadata from the client side.
  3. Update the database on the server side, which includes:
    • Updating the database schema.
    • Regenerating the scope info to reflect the new metadata.
    • Deprovisioning the old metadata.
    • Provisioning again with the new schema.
  4. Sync the database on the client side to apply the changes.

As mentioned in document, this process is necessary after any database schema updates, but the time required is a major bottleneck, especially when two or more clients attempt to update at the same time, causing the process to fail due to database instance limitations.

Steps to Reproduce:

  1. Update the database schema (add, modify, or remove columns).
  2. Simultaneously initiate the migration steps on two or more clients:
    • Update client schema, remove metadata, update server, sync.
  3. Observe that the migration process takes 10 to 20 minutes per client and may fail with the following timeout error:
    Microsoft.Data.SqlClient.SqlException (0x80131904): Execution Timeout Expired. The timeout period elapsed prior to completion of the operation or the server is not responding.
    ---> System.ComponentModel.Win32Exception (258): The wait operation timed out.

Expected Behavior: The schema migration process should be optimized for better performance, especially in scenarios where multiple clients are performing updates simultaneously. The process should be able to handle concurrent schema updates without timing out.

Actual Behavior: The migration steps take too long and often fail due to timeouts when multiple clients perform schema updates at the same time. The database instance appears to be overloaded, leading to the "Execution Timeout Expired" error.

Additional Context: We are using SQL Server and leveraging change tracking for client-server synchronization. Improving the performance of the schema migration process and handling multiple concurrent client updates would significantly enhance the synchronization experience.

Environment:

Mimetis commented 1 month ago

Do you experience the same timeout when you are doing the migration for one client only (without parallelization with another client) ?

Did you tried using a no lock option, using the the SyncOptions.TransactionMode (set to None)

Are you using SyncOptions.DisableConstraintsOnApplyChanges ? If yes, you should try with the option False(if you are sure you will not have any constraints error on changes apply)

Let me know if one of this option can improve your scenario

nicolazreinh commented 1 month ago

With one client, it takes the same amount of time. But it succeeds. When doing it with multiple clients, it fails due to lake connections on the database. I haven't tried SyncOptions.TransactionMode, does it improve performance? I have SyncOptions.DisableConstraintsOnApplyChanges Set to false. Thank you for your response.

Mimetis commented 1 month ago

Are you able to create a small sample I can use to reproduce the problem ?

I haven't tried SyncOptions.TransactionMode, does it improve performance?

Maybe, I dont know, you should try

nicolazreinh commented 1 month ago

I revised the process, and it improved performance and reduced the time required. Here's what I did:

  1. Applied the update script on the client side.
  2. Updated the server-side database, which involved:
    • Modifying the database schema.
    • Regenerating the scope info to reflect the new metadata.
    • Deprovisioning the old metadata.
    • Re-provisioning with the updated schema.
  3. Retrieved the new scope and provisioned it locally to apply the changes.

This improved the process. However, I'm now encountering a new issue during synchronization: "Change tracking is not enabled on table 'dbo.tbl_example'." Do you have any idea how this occurs and how to fix it?

Note that: I have multiple scopes.

Mimetis commented 1 month ago

Normally, DMS will enable the change tracking on each table required. Due to your migration phase, maybe there is an edge scenario where the "_enable_change_tracking_ontable" routine is not executed correctly. Once again if you have an example we can work on, will help me finding the glitch

Otherwise, DMS is "just" enabling change tracking on the tables required by executing the script:

$"ALTER TABLE {this.sqlObjectNames.TableQuotedFullName} ENABLE CHANGE_TRACKING;";
nicolazreinh commented 1 month ago

I made the alter script, but it failed in the next migrate. I’ll provide a pseudocode outline of my approach, as the full solution is quite complex. It's important to note that I’m working with multiple scopes, and one specific issue involves four different scopes that share the same table, dbo.tbl_example, which is causing the problem. Program.zip

Mimetis commented 1 month ago

thanks for the program.cs file but it does not allow me to investigate your issue. Let me know if you need more info, but as of now, I'm not able to help you anymore on this issue until I have more information.

nicolazreinh commented 1 month ago

Yes, sure. What more information do you need? More code? More explanation?

Mimetis commented 1 month ago

Well, I need a way to reproduce the error I you can create a very simple sample I can use to reproduce it, I will be able to help: