Mimetis / Dotmim.Sync

A brand new database synchronization framework, multi platform, multi databases, developed on top of .Net Standard 2.0. https://dotmimsync.readthedocs.io/
MIT License
855 stars 188 forks source link

Investigate using the CRDT data structure #1171

Open symbiogenesis opened 2 months ago

symbiogenesis commented 2 months ago

Conflict-free replicated data types are a sort of cutting edge data structure for distributed synchronization. It is what allows things like collaborative editing on Google Docs.

In theory using this as a back-end would allow excellent performance characteristics with minimal complexity. Offloading most of the heavy lifting to the algorithm.

This approach has already been implemented in Dart.

sql_crdt, an abstract implementation for using relational databases as a data storage backend.

sqlite_crdt, an implementation using Sqlite for storage, useful for mobile or small projects.

postgres_crdt, a sql_crdt that benefits from PostgreSQL's performance and scalability intended for backend applications.

The CRDT algorithm itself was also implemented in .NET over a SingalR websocket with Yjs although it is less relevant to the needs of relational databases.

The notion of using a websocket is kind of nuts, but quite appealing. All changes could be streamed as they happen.

VagueGit commented 2 months ago

My understanding is that CRDT offers eventual consistency. This doesn't work for us. Our customers expect their data to be 'immediately' consistent across all devices in their company.

We even found when we experimented with SQLite WAL, our customers were unhappy because data written to the log wasn't immediately synced. We tried forcing checkpoints without success so abandoned WAL.

I mention that to flag how sensitive some are to having all their data now.

symbiogenesis commented 2 months ago

Interesting. Do you sync on a particular interval or during particular events?

A SignalR-based approach might be kind of nice, by comparison. Irrespective of CRDT. But conflicts in this kind of scenario are likely. The eventual consistency of CRDT, in the real world, may be no slower than the current approach.

I don't think any distributed synchronization scheme can offer anything better than eventual consistency. They can offer immediate inconsistency, and then become consistent or not, given the algorithm and network conditions.

Until faster-than-light communication via quantum entanglement is invented, that is.

I am only brainstorming.

VagueGit commented 2 months ago

Ours is a conventional business app. We sync when the user clicks Save ... save a customer, save an order etc. Each company that uses our app has their own db on our server. Each company may have many client devices syncing to that db.

The server db is the source of truth. Last write wins. So we have consistency between a client and the server when each client syncs with the server.

With WAL, Save and Sync saves to the log, but syncs from the local db. Users complained they had saved on one device and the updates had not appeared on another device.

Back in the day, our customers were using our app on their LAN. They got used to consistent data across devices. When we moved away from that model we had to maintain as far as possible a similar appearance of consistency. DMS does that better than alternatives we explored.

Drifting off-topic but we exist and don't exist in a multiverse where all topics are relevant and irrelevant.