Open colinxs opened 3 years ago
@Cyan4973 anything you are using internally that could be upstreamed to rsync?
Even better than small dictionaries, rsync has entire files you could use like reference frames for MPEG2? https://en.wikipedia.org/wiki/Reference_frame_(video)
A postgres/mysql database block format aware encoder would definitely be useful.
There is the --patch-from
mode that could be used for that,
but it's currently limited to < 2 GB reference and data size to compress.
zstd supports creating a dictionary from a set of files which can then be used to speed up/increase the compression ratio on subsequent compressions. Looking at the code (and please correct me if I'm wrong here) in token.c and match.c, it appears that the that a new dictionary is used for each file. Naively, it seems like that dictionary could be shared for all files in the tree. Based on some benchmarking with a large set of TOML files, the performance increase is significant when using a dictionary.
Here is a discussion of this idea: https://unix.stackexchange.com/questions/553111/is-the-rsync-block-compression-dictionary-reset-for-each-file
As an extension to reusing the dictionary across files within a single call to
rsync
, a user could (optionally) provide an external dictionary or reuse one thatrsync
generates (similar to Batch Mode and --write-batch/--read-batch).Both of these things would help significantly for something like real-time sync (as
lsyncd
, which is basicallyinotify
+rsync
, does) where I'm continuouslyrsync
ing a file tree across a network using compression.