Synchronization between devices

curusarn commented 4 years ago

Designed as part of my thesis - available here: https://github.com/curusarn/ctu-fit-master-thesis/releases/tag/v1.0

tivvit commented 2 years ago

Hi, I am thinking about implementing this based on the design you outlined in your thesis.

Let me summarize what I have in mind.

You propose a component called sync connector. This component may run locally or remotely (some kind of auth will be necessary). Local sync connector is useful for third-party sync solutions. The remote sync connector is stand-alone. The communication protocol is JSON REST (same as for the resh daemon). This component handles the long-term storage of the history records (it may be a database or, for example, a file that will be synced by some third-party solution). There are two main communication paths between the resh daemon and this component.

Store path. Resh daemon first requests what is the newest stored record (probably also with a machine ID) and then sends a batch of new records to the component. All the records are identified by ID. Therefore, it is easy to do the deduplication.
Read path. Resh daemon requests the newest stored record and gets the latest records for all the machine IDs (this is probably the same endpoint as mentioned above with an optional query param). Then request all the records missing in the daemon (in one request for each machine or maybe in a single request). The records are merged to the resh daemon internal storage, making it searchable.

Both mentioned paths happen after the start and then periodically in a defined interval (maybe set individually for each).

Configuring the synchronization period and the sync connector address (+auth) remains an open question because I know that the resh configuration is not finalized.

I also think it would be a bit better not to do a read path but a query path. You would run a local resh daemon (as you do now) that would search locally, for example, 1k history records, and then query the remote (central backend) only for a specific query. This approach scales better (because it is centralized by default, and data from all the clients may be stored in a database - not in memory). You transfer data from the remote only for a specific query. But this is just a thought for the future because it needs a bigger change in the search engine (query 2 separate backends asynchronously - local and remote, then display local first (with a notice for the user) and then merge the results and display them). It also needs an implementation of search in the sync connector.

I will prepare a prototype for the sync connector as part of Hacktoberfest.

Steps:

I will create a separate repository for the sync connector (probably using plain files or SQLite as storage)
I will implement the mentioned endpoints
I will make the necessary changes in the resh daemon and send the PRs (probably separate for store and read loops)

tivvit commented 2 years ago

SQLite sync connector implementation https://github.com/tivvit/resh-sync-connector-sqlite

curusarn / resh

Synchronization between devices #145