Jwink3101 / syncrclone

Python-based bi-direction sync tool for rclone
MIT License
154 stars 13 forks source link

syncrclone

Robust, Configurable, Bi-Directional sync between any two rclone remotes with automatic conflict resolution and backups.

Note that syncrclone has been sherlocked. As of v1.58.0, rclone now has bisync. syncrclone works fundamentally differently as compared in syncrclone vs bisync (and rclonesync-v2). For the time being, I fully plan to continue development. To be 100% clear, I have no hard feelings about it.

syncrclone is in beta. It has been tested with a variety of backends but by no means all of them. And only has been real-world tested with a few. See testing notes for some details.

Features

Installation and Usage

First, install rclone. Then, you must have python 3.6+ installed. There are many options. I am a fan of miniconda.

Install syncrclone:

$ python -m pip install git+https://github.com/Jwink3101/syncrclone

Configure rclone: I prefer to specify a config file using --config rclone.cfg. Add the remotes you wish to sync

Initiate syncrclone: (see "Local and Remote Mode" below)

syncrclone --new config.py

Modify the config code. It is fully documented but also see config tips. If you use your own rclone config file above, make sure to include

rclone_env = {'RCLONE_CONFIG': 'rclone.cfg'}

or

rclone_flags = ['--config','rclone.cfg']

It is a good idea to read the entire config file and set as needed.

Now run it! You do not need to do anything special even though it is the first run. It's just that all files will be considered new.

$ syncrclone config.py

That's it!

WARNING: The config file is directly executed and is assumed trusted. If you keep config files in sync, be careful for malicious code

Local and Remote Mode

syncrclone offers a convenience mode for local repos. It is functionally identical but makes calling and set up easier. The differences are:

To work in local mode, specify a directory in the command line and it will search upwards for .syncrlone/config.py. To work in remote mode, specify the path to the config script. Recall that rclone is always calls from the directory of the sync config therefore, when in local mode, it is being called from .syncrclone. Specify other paths as needed.

For example

$ cd /path/to/local/files
$ syncrclone

is the same as the following:

$ cd /path/to/local/files
$ syncrclone .syncrclone/config

or even deeper:

$ cd /path/to/local/files/deeper/sub/dirs/

Then you can do either of the following

$ syncrclone   # Will automatically find it
$ syncrclone ../../../.syncrclone/config.py 

rclone Versions

This tool uses some newer rclone flags so it is always good to make sure you're using the newest rclone. There is a small chance that an rclone change could break something so you may have to use a previous version until it is updated.

Filtering, etc

All filtering is handled by rclone's filtering. See their detailed documentation.

Filter flags should be set only in the config filter_flags section. There are many options for filtering such as --exclude path. See note below about --exclude-if-present.

Remember that rclone is called from the same directory as the config file so make sure paths for flags such as --filter-from are correctly specified.

See more on Filters in the config tips

Exclude if Present Filters: A Warning

Filters work well since they are applied to both sides. This means that changing a filter will make the files look like it was deleted on both sides and nothing will happen.

However, the --exclude-if-present filter is very dangerous if the excluded file is added on one side only after files in the directory are in sync. It will cause rclone to skip the directory on one side and appear like it was deleted.

There are safe ways to use --exclude-if-present. For example, you can place exclude file in place on both sides before syncing. (Or placing it without adding the filter, syncing, and then adding the filter). Or, if the files have never been synced before (i.e. it's a new directory with the exclude file) then nothing bad will happen.

Note that if --exclude-if-present is found in the filter_flags, a warning will be emitted.

Non-ModTime comparisons and conflicts

The default is to decide if files need to sync by comparing ModTime (or mtime). However, you can also compare by size or hash (robust).

If you compare by size or hash, you can still resolve conflicts with modification time. Do note that not all remotes support ModTime and/or it may not be reliable. If using one of those types of remotes, do not use newer, older, or newer_tag conflict resolution. See remote overview for details.

If your remote doesn't store hashes and must recalculate them (e.g. local, sftp), use reuse_hashes(A/B) if desired to only recalculate as needed. See the config file

Differences from PyFiSync

PyFiSync was originally designed to use ssh+rsync for the remote. rsync is able to efficiently transfer small changes to large files so a lot (!!!) of effort went into tracking moves even on changed files. And, I know I would always have mtime and inodes (No Windows support) and when in macOS, birthtime.

I later added rclone remotes. rclone is an amazing piece of software but has no way to sync deltas (which is reasonable given cloud infrastructure) so all of the work to track modified files was wasted! And not even possible on the remote side.

syncrclone was designed exclusively for rclone which means that I didn't have to try to track moves with modifications. As such, the algorithm is a lot simpler and a lot of edge cases are eliminated. The key difference with this algorithm is that files that match are removed from consideration right away. Then moves are only tracked via files that are (a) new, (b) match a previous file, and (c) are marked for deletion on the other side. Furthermore, while risky, syncrclone can compare sides by file size alone (it can also track moves by file size alone but that is really risky!) so all rclone remotes can now be used.

Additional Documents

Some additional docs:

Alternatives

File sync is surprisingly opinionated in many ways including use of config files vs pure CLI, how to handle conflicts, how to handle moves, how/if to backup files, etc.

I wrote syncrclone partially because I wanted a sync tool that works exactly the way I prefer! Same reason I wrote PyFiSync. But there are alternatives out there. To name a few:

I may have some details wrong. Please let me know and I will fix them.