cjnaz / rclonesync-V2

A Bidirectional Cloud Sync Utility using rclone
MIT License
356 stars 39 forks source link

rclonesync - A Bidirectional Cloud Sync Utility using rclone

NOTE: rclonesync functionality is now natively supported in rclone v1.58.0+. See rclone bisync. Please consider moving over to the native rclone implementation. rclonesync will remain here with critical bug fix support only, for the time being.

Rclone provides a programmatic building block interface for transferring files between a cloud service provider and your local filesystem (actually a lot of functionality), but rclone does not provide a turnkey bidirectional sync capability. rclonesync provides a bidirectional sync solution using rclone.

rclonesync high level behaviors / operations

rclonesync supported usage:

Installation, setup, getting started

Notable changes in the latest release

V3.2

rclonesync command line interface

$ ../rclonesync -h
usage: rclonesync [-h] [-1] [-c] [--check-filename CHECK_FILENAME]
                  [-D MAX_DELETES] [-F] [--no-check-sync] [--check-sync-only]
                  [-e] [-f FILTERS_FILE] [-r RCLONE] [--config CONFIG]
                  [--rclone-args ...] [-v] [--rc-verbose] [-d] [-w WORKDIR]
                  [--no-datetime-log] [--no-cleanup] [-V]
                  Path1 Path2

***** BiDirectional Sync for Cloud Services using rclone *****

positional arguments:
  Path1                 Local path, or cloud service with ':' plus optional
                        path. Type 'rclone listremotes' for list of configured
                        remotes.
  Path2                 Local path, or cloud service with ':' plus optional
                        path. Type 'rclone listremotes' for list of configured
                        remotes.

optional arguments:
  -h, --help            show this help message and exit
  -1, --first-sync      First run setup. WARNING: Path1 files may overwrite
                        path2 versions. Consider using with --dry-run first.
                        Also asserts --verbose.
  -c, --check-access    Ensure expected RCLONE_TEST files are found on both
                        path1 and path2 filesystems, else abort.
  --check-filename CHECK_FILENAME
                        Filename for --check-access (default is
                        <RCLONE_TEST>).
  -D MAX_DELETES, --max-deletes MAX_DELETES
                        Safety check for percent maximum deletes allowed
                        (default 50%). If exceeded the rclonesync run will
                        abort. See --force.
  -F, --force           Bypass --max-deletes safety check and run the sync.
                        Also asserts --verbose.
  --no-check-sync       Disable comparison of final LSL files (default is
                        check-sync enabled).
  --check-sync-only     Only execute the comparison of LSL files from the last
                        rclonesync run.
  -e, --remove-empty-directories
                        Execute rclone rmdirs as a final cleanup step.
  -f FILTERS_FILE, --filters-file FILTERS_FILE
                        File containing rclone file/path filters (needed for
                        Dropbox).
  -r RCLONE, --rclone RCLONE
                        Path to rclone executable (default is rclone in path
                        environment var).
  --config CONFIG       Path to rclone config file (default is typically
                        ~/.config/rclone/rclone.conf).
  --rclone-args ...     Optional argument(s) to be passed to rclone. Specify
                        this switch and rclone ags at the end of rclonesync
                        command line.
  -v, --verbose         Enable event logging with per-file details. Specify
                        once for info and twice for debug detail.
  --rc-verbose          Enable rclone's verbosity levels (May be specified
                        more than once for more details. Also asserts
                        --verbose.)
  -d, --dry-run         Go thru the motions - No files are copied/deleted.
                        Also asserts --verbose.
  -w WORKDIR, --workdir WORKDIR
                        Specified working dir - useful for testing. Default is
                        ~user/.rclonesyncwd.
  --no-datetime-log     Disable date-time from log output - useful for
                        testing.
  --no-cleanup          Retain working files - useful for debug and testing.
  -V, --version         Return rclonesync's version number and exit.

Typical run log (test case test_changes output - normally timestamps are included):

../rclonesync ./testdir/path1/ ./testdir/path2/ --verbose --workdir ./testwd/ --no-datetime-log --no-cleanup --rclone rclone --config /home/<me>/.config/rclone/rclone.conf
***** BiDirectional Sync for Cloud Services using rclone (V3.2 201201) *****
Lock file created: </tmp/rclonesync_LOCK_._testdir_path1_._testdir_path2_>
Synching Path1  <./testdir/path1/>  with Path2  <./testdir/path2/>
Command args: <Path1=./testdir/path1/, Path2=./testdir/path2/, check_access=False, check_filename=RCLONE_TEST, check_sync_only=False, config=/home/<me>/.config/rclone/rclone.conf, dry_run=False, filters_file=None, first_sync=False, force=False, max_deletes=50, no_check_sync=False, no_cleanup=True, no_datetime_log=True, rc_verbose=None, rclone=rclone, rclone_args=None, remove_empty_directories=False, verbose=1, workdir=./testwd/>
>>>>> Path1 Checking for Diffs
  Path1      File is newer                     - file2.txt
  Path1      File was deleted                  - file4.txt
  Path1      File is newer                     - file5.txt
  Path1      File was deleted                  - file6.txt
  Path1      File is newer                     - file7.txt
  Path1      File was deleted                  - file8.txt
  Path1      File is new                       - file11.txt
     7 file change(s) on Path1:    1 new,    3 newer,    0 older,    3 deleted
>>>>> Path2 Checking for Diffs
  Path2      File is newer                     - file1.txt
  Path2      File was deleted                  - file3.txt
  Path2      File is newer                     - file5.txt
  Path2      File is newer                     - file6.txt
  Path2      File was deleted                  - file7.txt
  Path2      File was deleted                  - file8.txt
  Path2      File is new                       - file10.txt
     7 file change(s) on Path2:    1 new,    3 newer,    0 older,    3 deleted
>>>>> Determining and applying changes
  Path1      Queue copy to Path2               - ./testdir/path2/file11.txt
  Path1      Queue copy to Path2               - ./testdir/path2/file2.txt
  Path2      Queue delete                      - ./testdir/path2/file4.txt
  WARNING    New or changed in both paths      - file5.txt
  Path1      Renaming Path1 copy               - ./testdir/path1/file5.txt_Path1
  Path1      Queue copy to Path2               - ./testdir/path2/file5.txt_Path1
  Path2      Renaming Path2 copy               - ./testdir/path2/file5.txt_Path2
  Path2      Queue copy to Path1               - ./testdir/path1/file5.txt_Path2
  Path2      Queue copy to Path1               - ./testdir/path1/file6.txt
  Path1      Queue copy to Path2               - ./testdir/path2/file7.txt
  Path2      Queue copy to Path1               - ./testdir/path1/file1.txt
  Path2      Queue copy to Path1               - ./testdir/path1/file10.txt
  Path1      Queue delete                      - ./testdir/path1/file3.txt
  Path2      Do queued copies to               - Path1
  Path1      Do queued copies to               - Path2
             Do queued deletes on              - Path1
             Do queued deletes on              - Path2
>>>>> Refreshing Path1 and Path2 lsl files
>>>>> Checking integrity of LSL history files for Path1  <./testdir/path1/>  versus Path2  <./testdir/path2/>
Lock file removed: </tmp/rclonesync_LOCK_._testdir_path1_._testdir_path2_>
>>>>> Successful run.  All done.

rclonesync Operations

rclonesync keeps copies of the prior sync file lists of both Path1 and Path2 filesystems, and on a new run checks for any changes. Note that on some (all?) cloud storage systems it is not possible to have file timestamps that match between the local and other cloud filesystems. rclonesync works around this problem by tracking Path1-to-Path1 and Path2-to-Path2 deltas, and then applying the changes on the other side.

Notable features / functions / behaviors

LIMITATIONS (most notable) - WARNING

See the Github Issues tab for further details.

  1. rclonesync relies on file date/time stamps to identify changed files. If an application (or yourself) should change the content of a file without changing the modification time then rclonesync will not notice the change, and thus will not copy it to the other side.
  2. New empty directories on one path are not propagated to the other side (issue #63). This is because rclone, and thus rclonesync, natively works on files not directories. This sequence is a workaround but will not propagate the delete of an empty to the other side:

    1) rclonesync <path1> <path2>
    2) rclone copy <path1> <path2> --filter "+ */" --filter "- **" --create-empty-src-dirs
    3) rclone copy <path2> <path1> --filter "+ */" --filter "- **" --create-empty-src-dirs
  3. When using local, ftp, or sftp remotes rclone does not create temporary files at the destination when copying, and thus if the connection is lost the created file may be corrupt, which will likely propagate back to the original path on the next sync, resulting in data loss (rclonesync issue #56 and rclone issue #1316). This is a problem for rclone to solve, as there is no way for rclonesync to gracefully and reliably deal with it.
  4. Files that change during an rclonesync run may result in data loss (issue #24). This has been seen in a highly dynamic environment, where the file system is getting hammered by running processes during the sync. The best solutions are to 1) sync at quiet times, and/or 2) to filter out unnecessary directories and files.
  5. Exception / failure if there are bogus (non-UTF8) filenames (issue #62). rclonesync is hard coded to work with UTF8-encoded filenames. Solutions: Fix invalid filenames, filter out directories containing such files, or put such files into a .zip file with a proper name.
  6. Syncing with non-case-sensitive filesystems, such as Windows and Box, can result in filename conflicts (issue #54). This may have some handling in rclonesync in a future release. The near term fix is to make sure that files on both sides don't have spelling case differences (Smile.jpg vs. smile.jpg).
  7. Google docs exist as virtual files on Google Drive, and cannot be transferred to other filesystems natively. rclonesync's handling of Google Doc files is to 1) Flag them in the run log output as an FYI, and 2) ignore them for any file transfers, deletes, or syncs. See TROUBLESHOOTING.md for more info.
  8. Renaming a folder on side A results is deleting all files on side B and then copying all files again from A to B (issue #71). rclonesync sees this as all files in the old directory name as deleted and all files in the new directory name as new. Similarly, renaming a directory on both sides to the same name will result in creating _Path1 and _Path2 files on both sides. Quite a mess. The most effective and efficient method of renaming a directory is to 1) rename it on both sides, then 2) do a --first-sync.

Windows support

Support for rclonesync on Windows was added in V2.3.

Usual sync checks

Type Description Result Implementation
Path2 new File is new on Path2, does not exist on Path1 Path2 version survives rclone copy Path2 to Path1
Path2 newer File is newer on Path2, unchanged on Path1 Path2 version survives rclone copy Path2 to Path1
Path2 deleted File is deleted on Path2, unchanged on Path1 File is deleted rclone delete Path1
Path1 new File is new on Path1, does not exist on Path2 Path1 version survives rclone copy Path1 to Path2
Path1 newer File is newer on Path1, unchanged on Path2 Path1 version survives rclone copy Path1 to Path2
Path1 older File is older on Path1, unchanged on Path2 Path1 version survives rclone copy Path1 to Path2
Path2 older File is older on Path2, unchanged on Path1 Path2 version survives rclone copy Path2 to Path1
Path1 deleted File no longer exists on Path1 File is deleted rclone delete Path2

UNusual sync checks

Type Description Result Implementation
Path1 new AND Path2 new File is new on Path1 AND new on Path2 Files renamed to _Path1 and _Path2 rclone copy _Path2 file to Path1, rclone copy _Path1 file to Path2
Path2 newer AND Path1 changed File is newer on Path2 AND also changed (newer/older/size) on Path1 Files renamed to _Path1 and _Path2 rclone copy _Path2 file to Path1, rclone copy _Path1 file to Path2
Path2 newer AND Path1 deleted File is newer on Path2 AND also deleted on Path1 Path2 version survives rclone copy Path2 to Path1
Path2 deleted AND Path1 changed File is deleted on Path2 AND changed (newer/older/size) on Path1 Path1 version survives rclone copy Path1 to Path2
Path1 deleted AND Path2 changed File is deleted on Path1 AND changed (newer/older/size) on Path2 Path2 version survives rclone copy Path2 to Path1

Benchmarks

Here are a few data points for scale, execution times, and memory usage.

This first set of data was between my local disk to Dropbox. My Speedtest.net download speed is ~170 Mbps, and upload speed is ~10 Mbps. 500 files (~9.5 MB each) are already sync'd. 50 files were added in a new directory, each ~9.5 MB, ~475 MB total.

Change Operations and times Overall run time
500 files sync'd (nothing to move) 1x LSL Path1 & Path2 1.5 sec
500 files sync'd with --check-access 1x LSL Path1 & Path2 1.5 sec
50 new files on remote Queued 50 copies down: 27 sec 29 sec
Moved local dir Queued 50 copies up: 410 sec, Queued 50 deletes up: 9 sec 421 sec
Moved remote dir Queued 50 copies down: 31 sec, Queued 50 deletes down: <1 sec 33 sec
Delete local dir Queued 50 deletes up: 9 sec 13 sec

This next data is from a user's application. They have ~400GB of data over 1.96 million files being sync'ed between a Windows local disk and a remote/cloud (type??). The file full path length is average 35 characters (which factors into load time and RAM required). (Data points to be added are noted once the user replies. If you have similar large-scale data please share.)

Revision history