Mimetis / Dotmim.Sync

A brand new database synchronization framework, multi platform, multi databases, developed on top of .Net Standard 2.0. https://dotmimsync.readthedocs.io/
MIT License
902 stars 194 forks source link

What would be missing to have a full feature sync framework ? #255

Closed Mimetis closed 4 years ago

Mimetis commented 4 years ago

Hey guyz.

I've finally reach a point, where I've almost done all the features I wanted to develop for this framework.

I still have some more things to add, like:

Other than that, I'm pretty happy with the actual features in place in the last version.

Do you think of anything that is missing, and would be an absolute required feature, for you ?

champcbg commented 4 years ago

I agree logging is very important.

On the where clauses would it be possible to choose equation? Something like

avbQAFilter.AddWhere("UserId", "ActiveVisionBoard", "UserId", "ActiveVisionBoard", WhereClause.In);

WhereClause can be an enum the specifies the equation type

WhereClause.In WhereClause.GreaterThen WhereClause.LessThen WhereClause.NotEqual

Im really interested in IN clause, but the rest could be useful also.

YoussefSell commented 4 years ago

definitely logging is an important feature to be considered,

and I would add a discussion on the possibility of providing a Javascript web client-provider, what we should consider when building it!?, and is it even possible (it not gonna be hard), I looked into the actual web client and is depending on the core package, so we should abstract some of the core features into a JS implementation and build the JS client on top of it.

this feature will be interesting, and it will open the door for many possibilities

Mimetis commented 4 years ago

Hi @YoussefSell Thanks for feedback Logging is partially implemented in the last version, but I did not had the time yet to make the documentation

Regarding the JS implementation, it's quite complicated depending the scenario

For web browser, old plain vanilla JavaScript, since we don't have a relational database out of the box, for web client, it's nearly impossible to adapt. Web Sql has been deprecated in flavor of NoSQL database IndexedDB. See more here : https://hacks.mozilla.org/2010/06/beyond-html5-database-apis-and-the-road-to-indexeddb/ This type of local storage for browser is not compatible at all with DMS

If you talk about JS as a language used in an electron application, reactjs native and so on, so the response is Yes, it's possible, assuming we are migrating the whole client side of the DMS framework. But it will be a really complicated task and the maintenance will be hard. I'm not sure it's worth it, to be honest.

Last but not least, we can talk about migrating the server side as well to JS, supported by NodeJS. Once again it's a really complicated task, but still feasible. In that case, I think it's a complete new project to develop, based on the architecture and algorithms already in place within DMS. I've think about that new project for a long time, but did not find the time to even starting something. Once again I'm not sure it's worth it to create a NodeJS version, but I may be wrong.

gentledepp commented 4 years ago

Logging is indeed very important. Especially the time it takes to execute the separate sync steps and the amount of rows transferred are interesting. And also some detailed information on sync conflicts that appear so that it is easier to pinpoint the issue.

We are actually using dotmim.sync events to gather such information and send them as custom events to Microsoft Application Insights.

One addition we had to make was to add a handler that gets any exception from dotmim.sync on the serverside and allows us to log it. Because (correct me if I am wrong, because we're using an old version of DMS) all exceptions are caught by dotmim.sync, wrapped into a response message and sent to the client with a 200 success code, montitoring systems like Application Insights and the like cannot automatically pick up any failed requests which is pretty bad. So if you could find a solution for this (allow to throw an exception to the client without completely swallowing the exception itself) would be awesome

gentledepp commented 4 years ago

Regarding the advanced filter management there is a solution for Microsoft Dynamics called "Resco". They solved this problem the following way: 1) Sync filters are only filtering on the server side on the initial sync 2) Incremental changes are distributed to all clients without filtering on the server side 3) rather, the clients know the sync filters themselves and thus just delete invalid rows on the client side

This is a very flexible solution, where you can even change sync filters afterwards. The obvious drawback is that incremental changes are always distributed to all clients. This can be a lot of data and hence we opted for a different solution.

Mimetis commented 4 years ago

Logs is already in place, I just need to make the doc :)

For the filter purpose, thanks for the solution. Do you have any pointer on any doc ?

gentledepp commented 4 years ago

Sure! Resco.Net has a sample application called "Woodford" and it contains a section for Sync Filters At the bottom of the page, there is a forward link to some additional documentation on this

gentledepp commented 4 years ago

I just read through the docs and here is my wishlist ;-)

  1. Reinitialize when outdated? https://dotmimsync.readthedocs.io/Synchronize.html#forcing-reinitialize If we have to do this ourselves, can you add this as a sample to the docs? Ah! I just found it here so maybe just add a hint to "forcing-reinitialize" ;-)

  2. Reinitialize when using MSSQL Change Tracking If setup with a retention of 14 days, what happens if a client connects after 15 days? Will it be reinitialized correctly?

  3. Nice-to-have: Sub-progress for each batch that is uploaded/downloaded so that huge, long running uploads/downloads still show progress (eye candy) https://dotmimsync.readthedocs.io/Progression.html

  4. ‼ A warning on using the local SQLiteProvider due to transaction-overlappings (no timestamp exists on sqlite)

  5. Command caching - as I already showed in an older issue, Command caching signifficantly improves performance when you are transferring a lot of rows (not dozens, but thousands) Is this already implemented? (Because the issue was removed)

  6. AspNet (WebApi 2) is crucial to us - we'll have to implement it ourselves then.

  7. ‼ When using the DMS Web API in a scaleout scenario (multiple web servers), memorycache will not be the right choice for session management. Does DMS have support for Redis as well?

  8. The Get() endpoint shows the server side config and "shows only if Environment == Development" Can we override this? Maybe we want to allow administrators to still be able to look at this in production

  9. 🎉 SerializerConverter This is awesome! Regarding MessagePack - this will very likely not work, because ContractlessStandardResolver may user dynamic/IL code generation. On UWP (because of .net Native) and on iOS (where IL emit is also forbidden) this causes a huge headache. So basically, we'd have to subclass all the serialized classes, annotate them with MessagePack attributes ([Key(0)] on each property - see MessagePack CSharp Quickstart) and serialize those :-|

  10. Snapshots are a great idea. For a scaleout scenario, it would be great if you could also store these in blob storage!

  11. In general, blob storage support would be a great addition to support scaleout scenarios!

  12. Idea: Regarding adding and removing a table in chapter provisioning One could actually track the timestamps per table so if you add a new table, it is re-synchronized automatically (because the default timestamp is 0) and all other tables just yield the changes since the last sync

    Also, why is adding a column so hard? (Removing should not be supported I guess, but adding a nullable column to a table should be absolutely doable - that would be a GREAT addition)

omarbekov commented 4 years ago

One feature that could be useful in conflict resolution is per column update tracking, where update_timestamp is written down for each column, not just for the row. Right now: Client 1 changes Column 1 of a record in a table at 00:00. Client 2 changes Column 2 of the same record in the same table at 00:05. Client 2 synchronizes at 00:10. Now value in Column 2 is of Client 2. Client 1 synchronizes at 00:15. Depending on the conflict resolution policy: if ServerWins, Client 1's Column 1 value is dismissed; if ClientWins, Client 2's Column 2 value is overwritten by Client 1's stale (old) Column 2 value;

Is this the current logic? I haven't tried out the latest version yet. With this feature, both new values of the columns should persist.

omarbekov commented 4 years ago

Another useful feature is an ability to specify columns that need to be removed from the sync setup. Specifying these columns in, for example, SetupTable could be really convenient instead of doing this through interceptors.

Mimetis commented 4 years ago

Another useful feature is an ability to specify columns that need to be removed from the sync setup. Specifying these columns in, for example, SetupTable could be really convenient instead of doing this through interceptors.

This how it's working actually. See https://dotmimsync.readthedocs.io/Provision.html#migration

omarbekov commented 4 years ago

Hi, @Mimetis ! The "Filtering Columns" feature is there indeed, but I'm talking about the opposite of this. There are scenarios where a bunch of tables have the same column/s. Let's say there is the same "CreatedBy" column in 20 tables. I want to synchronize these 20 tables with all the columns, except the "CreatedBy" columns. Instead of specifying columns for each table like this

setup.Tables["Customer"].Columns.AddRange(new string[] {
    "CustomerID", "EmployeeID", "NameStyle", "FirstName", "LastName" });

it would be easier to do something like this

setup.Tables["Customer"].ColumnsToRemove.AddRange(new string[] { "CreatedBy" });
Mimetis commented 4 years ago

Ok, got it :)

gentledepp commented 4 years ago

It may, however, be interesting to see if this was "easily" implementable using an interceptor!

omarbekov commented 4 years ago

In Dotmim.Sync 0.3.1. I imagine it's somewhat similar in the latest version.

var interceptor = new Interceptor<SchemaArgs>(args =>
{
    var schema = _webProxyServerProvider.GetLocalProvider(HttpContext).Configuration.Schema;
    foreach (var table in schema.Tables)
    {
        var columnsToRemove = new string[] { "CreatedBy", … };
        foreach (var columnToRemove in columnsToRemove)
        {
            if (table.Columns.Any(c => c.ColumnName == columnToRemove))
                table.Columns.Remove(columnToRemove);
        }
    }
});
_webProxyServerProvider.GetLocalProvider(HttpContext).On(interceptor);
await _webProxyServerProvider.HandleRequestAsync(HttpContext);
Mimetis commented 4 years ago

We can continue this "column removing tool" conversation on issue #285

Mimetis commented 4 years ago

@gentledepp

1 Reinitialize when outdated ?

I should rewrite the Metadatas documentation because it's still in the old markdown style and not well formatted. And I will write a small sample demonstrating the OutDated pattern.

2 Reinitialize with Change Tracking

Yes, it will reinitialized correctly if you are allowing it (using a correct SyncType.Reinitiliaze or by forcing it ) I will add a special paragraph to the Metadatas documentation

3 Sub progress

On the server side, from a remote orchestrator WebServerOrchestrator instance, you have a special interceptor you can call through the extension method OnSendingChanges

This interceptor is called JUST before sending back to the client, the batch. So far, the HttpMessageSendChangesResponse instance contains a Content that is a byte[] array property.

Here is a quick sample you can find in the HttpTests.cs file:

// Get response just before response with changes is send back from server
this.WebServerOrchestrator.OnSendingChanges(async sra =>
{
    var serializerFactory = this.WebServerOrchestrator.WebServerOptions.Serializers["json"];
    var serializer = serializerFactory.GetSerializer<HttpMessageSendChangesResponse>();

    using (var ms = new MemoryStream(sra.Content))
    {
        var o = await serializer.DeserializeAsync(ms);

        // check we have rows
        Assert.True(o.Changes.HasRows);
    }
});

On client, from a WebClientOrchestrator, you have a OnSendingChanges() method as well.

Eventually, on this client side, I think we miss an OnReceivingChanges() method, that could help having a better progression event when receiving things.

I will probably add this method in the next minor release (probably v0.5.6 )

4 Warning using SqliteProvider due to transaction-overlappings (no timestamp exists on sqlite)

In the doc ?

5 Command caching

I'm still researching any benefits from that. I think you've already done some tests measures, but I can't find the message related to it.

If you find something please open a new issue and will discuss it.

6 WebApi 2

Come On !! We are about to ship .NET 5 !! :)

7 Web server caching system

DMS is using any cache implementing IMemoryCache

You can create your own Redis Cache object that will implement the IMemoryCache interface.

In a future release (once again probably v0.5.6) I will transition from IMemoryCache to IDistributedCache to be able to use distributed cache like NCache cache, Sql Server cache, Memory Cache or Redis Cache.

More info here : Distributed caching services

I did not check yet, but I will make a smooth transition from IMemoryCache to IDistributedCache that will be transparent for the user (both will remains available, even if IMemoryCache will be marked as Obsolete )

8 Get() endpoint

You can do it manually (from the last commit in the master branch) this way:

[HttpGet]
[Authorize]
public async Task Get()
{
    // Checking the scope is optional
    // The [Authorize] attribute is enough, since it prevents anyone to access this method 
    // without a Bearer token
    // Anyway you can have a more detailed control using the claims !
    string scope = (User.FindFirst("http://schemas.microsoft.com/identity/claims/scope"))?.Value;
    string user = (User.FindFirst(ClaimTypes.NameIdentifier))?.Value;

    if (scope != "access_as_admin")
    {
        this.HttpContext.Response.StatusCode = StatusCodes.Status401Unauthorized;
        return;
    }

    await manager.WriteHelloAsync(this.HttpContext, default);
}

9 SerializerConverter

I cannot add a Key(0) on each property since Dotmim.Sync.Core has no reference on MessagePack So far, no solution... But I'm opened to any suggestion that won't force me to add any arbitrary reference to the Dotmim.Sync.Core project :)

10 Snapshot for blob storage

Quite a good idea. I will try to look into it. Maybe extract everything and make an interface like custom serializers and custom converters For a future release (god damn it ! :) )

12 Adding / Removing table

I've made some tests and it leads to inconsistent state. So far, I don't want to investigate too much in this scenario.

@lordofffarm

13 Per column tracking

Per column tracking is way more complicated to implement, and will lead to a very complex engine. It will be slower and will add a HUGE amount of metadatas. So, it will be a completely new framework, but won't be added in DMS

14 Columns to remove from SyncSetup

See #285

15 ... Oh wait, no 15 :)

Thanks for your feedback, I have a great TODO list for the next two months !!

gentledepp commented 4 years ago

4 Warning using SqliteProvider due to transaction-overlappings (no timestamp exists on sqlite)

Sorry, I completely forgot to elaborate. This was a simple "note to myself"

So... on MSSQL, we do have the min_activerowversion that gives you the lowest timestamp that is currently being used in a transaction. So by using min_activerowversion - 1 you get the lowest timestamp not currently hidden by a transaction and when storing it as sync_timestamp, you can be sure, that next time, when you only send incremental changes > sync_timestamp you will not miss any changes.

Now, on SQLite you do not have such a thing. Therefore you build the timestamps yourself. Now what could happen is:

Now you sync at t4 and send all changes (except "customer B" which is still hidden by tx1) to the server. Thereafter, you store t4 happily and think that everything went fine. However, "customer B" will never be sent to the server.

We fixed this issue by introducing some kind of ReaderWriterLock where

So while I do not think that this should be a part of Dotmim.Sync, you'd better warn your users in your docs! ;-)

5 Command Caching

Have a look at this closed issue here:

image

Implementing command caching is not a lot of effort as your architecture makes it quite easy. I did some measurements and added them to an issue, but I can't seem to find it. So you have to believe me. Jedi-mode: on: You are convinced, that adding command caching is an absolute necessity!

6 WebApi 2

well... we have a system installed at several customers that runs good old .NET. We cannot just migrate that easily 😭

8 Get() endpoint

Awesome! 😎🤜

9 SerializerConverter

I'll think about it!

12 Adding / Removing table

Well, full schema management would be an unnecessary overkill. Nevertheless, it would be very NICE to be able to

  1. add a new table to a scope
  2. add a new nullable column to a scope

Removing a table/column does not need to be supported, because that would break backward compatibility. Remember, there may be clients supporting v1 of the schema and others, that already run on v2! It is very, very uncommon, that you can upgrade all of your clients in one swoop!

With this functionality, clients just need to make sure to specify all columns explicitly in any Insert/Update query. (So no SELECT * FROM sometablename) Because if the server changes the schema, and the client still runs v1, it may unexpectedly get a new column and crash.

One thing left is, that if a client gets the new column, it must run a re-initialization on the sync. Otherwise, there may be some rows that already have a value in New_Column, but the client still has null values in it!

With the new table, this is much simpler, as clients that do not know them, will simply ignore them after all.

So I guess these features can be added afterwards as well. I guess what would be needed is