dexie / Dexie.js

A Minimalistic Wrapper for IndexedDB
https://dexie.org
Apache License 2.0
11.69k stars 641 forks source link

Implementing Dexie.Syncable ISyncProtocol #901

Open thoraj opened 5 years ago

thoraj commented 5 years ago

Hello,

I'm working on an implementation of ISyncProtocol which will be used in an end-to-end encrypted messaging application. The data exchanged in ISyncProtocol will be encrypted and decrypted with keys only known at the client. We are using asp.net core and SignalR.

When a blank/fresh client is started it will call subscribe() on the server, and the server will set up the subscription, and also send over changes.

My question is about this initial exchange when a client is brought up to date as part of the subscribe() call. There can be a lot of data which a blank client should receive, so I'm assuming using the partial flag is the way to go?

If the changes sendt has the partial flag set to true, the changes seem to not being committed in dexie (after the applyRemoteChanges call).

And there is no concept of an ack back to the server which could be used to continue sending changes (the last batch of changes would have partial set to false).

Sending all changes in a single batch with partial set to false works ok.

So I'm wondering about the correct (robust) way to ensure a new client receives all changes (in effect all the persisted objects).

BR, Thor A. Johansen

dfahlander commented 5 years ago

As what I recall partial changes are put in an intermediate table named "uncommittedChanges". @nponiros that built sync-client / sync-server may remember his experiences of using the partial flag. But as I recall, the Dexie.Syncable framework should directly start another sync in case it was part partial so it continues to recieve data until partial is false. Then the framework should commit the uncommitted changes into the db.

thoraj commented 5 years ago

@dfahlander

Thanks for the quick reply.

The uncommittedChanges table makes sense, and is how @nponiros 's WebSocketSyncServer handles partial when syncing client->server.

When sending server->client it seems partial is always false (so everything is sendt in a single batch). So I'm not sure how to split this into partials/batches. I've tried naively to send blindly. I.e send in batches and wait a little time between each batch, and in the last batch set partial = false. This kindof works, but I'm worried about not having any handshake or ack mechanism.

Also had a brief look in @nponiros' sync_server, but it was not clear from the source if and how it support partials when doing server->client sync.

Perhaps partials during the subscribe() phase is not supported?

-- Thor A. Johansen

dfahlander commented 5 years ago

I recall @nponiros brought up some questions around this and maybe he found that he couldn't use it for some reason. It should be supported by there might be an issue with it.

thoraj commented 5 years ago

@dfahlander: Thanks.

@nponiros: Are you able to provide some insights?

dfahlander commented 5 years ago

Just F.Y.I. there will be a newer alternative of Dexie.Syncable at some time but I can't promise when. Pull requests are still welcome to Dexie.Syncable and Dexie.Observable. The code was a total mess until @nponiros refactored it (many thanks for that!). Still it can be hard to follow as the flow and design remains the same. Though it seems to do it's job pretty ok.

thoraj commented 5 years ago

Interesting. What is the rationale for the change/replacement? Is there somewhere where I can have a look at what is planned, or where the changes are being discussed?

Particularly interested in anything that will help or make it easier to handle encryption (which means the sync server cannot know details about the objects/fields.

dfahlander commented 5 years ago

It will not nescessarily be a replacement, but at least a complement. There are no docs about the plans so far. I'm writing on a new solution of a universal database as a whole. The sync part is essential but this time I will make it possible to declare entire operations in javascript on both ends and let the synchronization submit operations instead of just create/update/delete - which will keep the synchronized database consistent.

Regarding encryption, a new addon, dexie-encrypted was recently released by @stutrek that adds encryption of non-indexed fields in a Dexie db. Right now, if combining it with dexie-syncable, I believe dexie-syncable will still sync the plain text data as dexie-encrypted decrypts its data on all accessing methods, but if dexie-syncable would be able to read the raw encryoted data instead and sync the encrypted version of the objects - there maybe would be something.

In order to accomplish this, I suppose dexie-observable/dexie-syncable would need to use the [Collection.raw()](https://dexie.org/docs/Collection/Collection.raw()) method to get the encrypted version of the data and sync that instead.

I think dexie-encrypted would also have to offer a raw write to collections/objects so that it does not double encrypt it when syncing back from server --> client.

stutrek commented 5 years ago

I have been working on a similar problem, we are not able to use Dexie.Syncable because we had to add a client side DB to an existing system.

To be sure that no messages are dropped, and to handle going offline for a short time, we used a blockchain like mechanism and a backfill endpoint. Each socket message has the ID of the previous, if that doesn't match the last message received, it queues up incoming messages and calls a backfill endpoint. I recommend looking at Redis Streams as they made our solution much simpler.

image

Our initial payload comes from an API endpoint.

nponiros commented 4 years ago

@thoraj Sorry just saw the ping now.. From what I remember I did implement partial also for server -> client. See https://github.com/nponiros/sync_server/blob/98e5ee3cc34e967bbc4150e37b0496e4b897ce15/lib/sync/get_server_changes.js

From what I remember: Client sends the latest revision it got from the server. When using partial for server -> client, the server returns only a part of the array and the latest revision is the newest element in the partial array. Next time the client requests data with that revision, the changes are newly calculated. The server also tells the client that it was a partial data set so that the client can immediately request more data.