deajan / osync

A robust two way (bidirectional) file sync script based on rsync with fault tolerance, POSIX ACL support, time control and near realtime sync
http://www.netpower.fr/osync
BSD 3-Clause "New" or "Revised" License
932 stars 101 forks source link

[SUGGESTION] Single server + Multi clients two-way sync #153

Open ugoviti opened 6 years ago

ugoviti commented 6 years ago

Hi,

nice job deajan!

I'm looking for a feature that will make osync perfect to become a lightweight single server / multi clients two-way sync.

Let's me explain:

osync works good when running inside a single server, but how can accomplish the following scenario?

osync can manage that configuration?

a similar scenario, based on unison, is explained here: https://l0x.de/posts/2016/11/17/dropbox-like-realtime-sync-unison/

With kind regards

deajan commented 6 years ago

Hi,

osync is designed to be a stateless sync system, where an initiator makes the connection to a target and syncs data. You should be able to run 3 sync jobs from initiator (server) using osync-batch, but there's no way to be realtime with osync as of today because I still didn't finish the osync-helper service.

The service would notify the initiator about a change, and trigger a sync session with the target. Once the initiator is updated, it would notifiy a change and trigger a sync session with the other targets.

Are you still up to use osync for that setup ? I could finish the service quickly and let you try it out.

ugoviti commented 6 years ago

Hi, great to know that! I like very much osync because it has a simple configuration layout and works without reinventing the wheel (rsync is a great tool), but I can't use it right now for my purpose.

Let's me explain the design and the goals of the solution:

I'm trying to build a sidecar docker container to use inside Kubernetes cluster to manage the synchronization of data between (horizzontally scaled) PODs:

Example of structure:

1 POD named storage with two containers inside:

1 POD named frontend with two containers inside:

Key features: 1) lightweight solution with small memory footprint (osync is perfect in that area) 2) sync POSIX/ACL permissions (osync is the only tool with that feature) 3) easy discovering of what container is out-of-sync (log parsing is a solution) 4) realtime or semi-realtime replicating of data from slave to master (osync support that) 5) realtime or semi-realtime replicating of data from master to slaves (this is what is missing)

The goal is to scale horizzontally the frontend pod, and synch the data with the master container on startup, and later get new data synced to the master from others slaves. If I'm not wrong, doesn't exist any solution with that feature in the container orchestration world.

Typical solutions provide:

1) Shared Storage like NFS server (not resilent, because SPOF). 2) Distribuite the data with the container (not pratical on frequent updates, and doesn't solve the problems of saving data from inside the frontend container)

the two-way setup is needed because, if a user upload a new file using the frontend, that file will be uploaded to the local storage (that is stateless and doesn't survive on restart), but must be synchronized soon with the master and so synchronized with the other slaves.

Right now, the nearest software that allow me to build that solution is Syncthing, but is not lightweight, doesn't sync POSIX permissions and it's hard to configure and manage.

My project is in developing phase and I'll be happy to test and bug reports the osync-helper service when is ready.

Let's me know. Keep up the good work

deajan commented 6 years ago

Hmmm... I'm currently working on the target-helper service, which could allow something like this to happen. I think I can get something to work later this week. I'll keep you updated.

deajan commented 6 years ago

I do have something that could work now (not production ready, not really well tested).

Could you try with current git master manually setting up an osync helper service on one of your targets by:

  1. Copying osync.sh to target system
  2. Editing target-helper.conf.example to include ssh credentials to your initiator and update sync dir paths
  3. Running osync target helper manually with _DEBUG=yes osync.sh /path/to/target-helper.conf --on-changes-target

Please let me know if this works for you.

ugoviti commented 6 years ago

Thank you deajan! I'm testing the 6c1b7a5 commit right now.

first a very little glitch when starting osync:

TIME: 1 - Creating conflictual file list. /usr/local/osync/osync.sh: line 3429: [: ==: unary operator expected TIME: 1 - Updating target replica.

line 3429 is about TRAVIS_RUN. starting with env TRAVIS_RUN=false doesn't show any errors.

This is my testing layout: 1) master node named: master (172.17.0.2) 2) first target node named: target1 (172.17.0.3) 3) second target node named: target2 (172.17.0.4)

This is my first test: a) started osync on master node with command: _DEBUG=yes /usr/local/osync/osync.sh /etc/osync.conf --on-changes

b) started osync on target nodes with command: _DEBUG=yes /usr/local/osync/osync.sh /etc/osync-target-helper.conf --on-changes-target

the target node log this:

bash-4.4# _DEBUG=yes /usr/local/osync/osync.sh /etc/osync-target-helper.conf --on-changes-target 2018-10-02 22:23:33 - Script begin, logging to [/var/log/osync.target_test.log]. 2018-10-02 22:23:33 - This is an unstable dev build [2018100201]. Please use with caution. 2018-10-02 22:23:33 - Local OS: [Linux 4.18.9-200.fc28.x86_64 unknown unknown GNU/Linux ("Alpine Linux" 3.8.0) 64-bit Unix]. 2018-10-02 22:23:33 - ------------------------------------------------------------- 2018-10-02 22:23:33 - Tue Oct 2 22:23:33 UTC 2018 - osync 1.3.0-beta1 script begin. 2018-10-02 22:23:33 - ------------------------------------------------------------- 2018-10-02 22:23:33 - Sync task [target_test] launched as root@target1 (PID 95) 2018-10-02 22:23:33 - #### Running osync in target helper file monitor mode. 2018-10-02 22:23:33 - Initiator of instance [target_test] should be notified of file changes now. 2018-10-02 22:23:33 - #### Monitoring now. 2018-10-02 22:23:58 - #### Changes detected, waiting 2 seconds before running next sync. 2018-10-02 22:24:00 - Initiator of instance [target_test] should be notified of file changes now. 2018-10-02 22:24:00 - #### Monitoring now.

some questions: the master node should know the target node address? I must specify the TARGET_SYNC_DIR in the master node? osync must run on master node? or the master node is only a SSH server with rsync command and nothing more?

I'm missing surely something, anyway the target node connect correctly to master, but the sync doesn't works. the target get notified about file changes but the file of master isn't synced to the target and vice-versa.

correct me if I'm wrong, I think I don't must run osync on the master node (is simply a data repository), so this is the correct setup?

1) master node is listening via ssh, nothing more. 2) target node have osync running with --on-changes-target option and connected via ssh to master node

I tried this setup also, the changes are detected bye the targets:

2018-10-02 22:09:30 - #### Changes detected, waiting 2 seconds before running next sync. 2018-10-02 22:09:32 - Initiator of instance [target_test] should be notified of file changes now. 2018-10-02 22:09:32 - #### Monitoring now.

but nothing is synced to and from the targets. the ssh works good of course.

I changed only these variables from osync-target-helper.conf:

INITIATOR_SYNC_DIR="ssh://root@172.17.0.2:22//data/" TARGET_SYNC_DIR="/data/"

I tried to invert the initiator with the target without success:

INITIATOR_SYNC_DIR="/data/" TARGET_SYNC_DIR="ssh://root@172.17.0.2:22//data/"

I'm missing something?

I seen osync create a file into master node named .osync-update.push:

target_test#20181002T222414.065765783 target_test#20181002T222944.917679000

Thank you very much for your work.

deajan commented 6 years ago

Fixed the travis thing an hour ago (had big trouble with travis, trying to debug), please update to latest commit.

All the sync code runs on initiator, which has to reach the target via ssh in order to be able to sync.

The correct setup consists of:

Indeed the target-helper only pushes a .osnyc-update.push file to the initiator. When this file is created/modified, the initiator detects changes and launches a sync cycle.

As of today, I support one initiator & target in this mode, and setting up multiple targets will be next. The next implementation will require initiator to queue different target syncs and check whether new files are modified while running a sync. I already have this kind of code running for another of my projects, but I have to adapt it to osync.

irreleph4nt commented 5 years ago

Hi. This looks very promising for a setup I have in mind, where I want to sync multiple Linux laptops to the same Master data on a server (two-way sync). Has the feature discussed here been tested or progressed some more over the last few months? Keep up the good work and thank you for creating this!

deajan commented 5 years ago

Hello,

Feature is done in current master since a couple of months. Not being tested by enough people though.

irreleph4nt commented 5 years ago

It will take some time for me to set up the things I need to test this, but once I am done I am happy to look at what's been discussed here in detail and report back. I'll be in touch!

deajan commented 5 years ago

Current master is ripe for a v1.3 release, you can use it safely. I'll probably have a release in a not too far future.

irreleph4nt commented 4 years ago

So 10 month later, here I am planning my osync setup. Sorry that it took so long. For the following setup, do you have a recommended osync configuration mode to achive what I want?

Scenario:

Goal: Achive a roaming profile like solution that syncs the user's home folder between one central server and multiple (in this case 2) other PCs.

Possible Solution:

Drawbacks / Problems I am thinking about: I need to avoid syncing to a third location (let's say /mnt/home/). Whilst this would mean I can simply have one sync config that syncs MASTER:/home to CLIENT:/mnt/home, this might make files accessible to users that do not own them, which would be bad. Also, running a config per user, can I run osync in a way so it only syncs files for the user currently logged in on the machine? I'd like to avoid syncing Peter's user folder even though he never logs in to CLIENT or LAPTOP.

Any information you can provide here is appreciated. Thank you for putting all this effort into osync! :)

deajan commented 4 years ago

@irreleph4nt Hi

I'd really recommend creating a sync file per user, since this will allow to bypass users that uploaded gigabytes of data, using a time limit.

A three way sync is still problematic since:

target (laptop): detects a change on /home/user1, notifies initiator initiator (master): initiator is notified, and will sync with laptop target (client): will not be synched automatically

The solution could be to setup a osync daemon per user and per target, in that case multiple sync will happen. Another solution is to tell the initiator daemon to sync one an hour regardless of changes.

I still need to validate osync v1.3 to become RTM, but I really don't have much time for that right now. Anyway, it should be stable as long as you don't play with mac / bsd (not tested in current master).