bcpierce00 / unison

Unison file synchronizer
GNU General Public License v3.0
4.11k stars 232 forks source link

Remove external rsync support (as more complicated than useful) #871

Open gdt opened 1 year ago

gdt commented 1 year ago

Unison has a copythreshold concept to use external sync. This has resulted in bugs (#865 #982). It has also resulted in a request to map progress from external rsync back to the GUI (#549).

This ticket postulates that external rsync isn't actually useful any more, and that therefore it would be reasonable to outright remove support. Results of experiments (as text; please no movies or images) comparing normal operation and external are welcome, especially as scripts that enable others to test.

gdt commented 10 months ago

This same content was posted to `unison-users@ on March 19, asking for feedback. There has been none.

dlucredativ commented 10 months ago

I have been asked to explain if we might need copyprog. I cannot do so based on long-term experience. However, based on some tests it seems that unison is single-threaded, while large files can be transferred in parallel potentially using multiple cores for compression. So maybe someone has a scenario with multiple large but well-enough compressible files, so that parallel rsyncs yield faster transfers.

gdt commented 10 months ago

Thanks. I didn't quite say this, but I am looking for actual evidence of usefulness, vs a theory that it might be useful, in a general enough case that it is worth the complexity. We have no test results on the table like "with this setup reproducible with this test script, regular unison sync takes X seconds but with copyprog rsync it takes Y".

dlucredativ commented 10 months ago

If it weren't for the lack of manpower "There are very few people ... working on Unison" (https://github.com/bcpierce00/unison/issues/982#issuecomment-1837163678) I would have suggested to decouple rsync from copyprog:

URI-Syntax would solve #982 while rsync-compatibility issues like #865 would become the users' problem.

tleedjarv commented 10 months ago

The original motivation of adding the copyprog feature in the first place seems to have been "performance" (see ccb8bedc9a1f857341d8d72695041390fc03b7e6 and 829bd6646dea57c89842f254051741106861d91d).

One of the mentioned shortcomings (not resuming partial transfers) has been fixed since 2010 (a70651593c6d7c46a42f2e999efe34b725e17543), since version 2.40.

The other shortcoming is a bit vague. Unison being so slow on new files that the copyprog feature would be warranted? It could have been true in 2008, but much has been improved since. I went ahead and did some low-effort testing of my own.

All testing has been done on a memory disk over a local loopback network to eliminate as much disk/network impact as possible. Tests run with a single 4 GiB file. There is nothing "scientific" or repeatable about these tests (I did not prepare a clean testing env) but I did take the average of multiple runs and the variance was tiny. The test results will not be meaningful in your environment! If you use copyprog then you have to do your own testing.

Method Transfer rate Comment
nc 2479 MiB/s Client: nc < infile
Server: nc > outfile
nc (zstd) 1469 MiB/s Client: zstd < infile \| nc
Server: nc \| zstdcat > outfile
scp 667 MiB/s
scp -C 80 MiB/s zlib
unison socket 1054 MiB/s another round of testing: 1292
unison local socket 1132 MiB/s another round of testing: 1277
unison socket over ssh 641 MiB/s ssh port forwarding
unison socket over ssh -C 85 MiB/s ssh port forwarding
unison ssh 506 MiB/s
unison ssh -C 81 MiB/s
unison copyprog 666 MiB/s
unison copyprog –compress (lz4) 1048 MiB/s
unison copyprog –compress (zstd) 931 MiB/s
rsync (ssh) 695 MiB/s
rsync (ssh) –compress (lz4) 1094 MiB/s
rsync (ssh) –compress (zstd) 1036 MiB/s
rsync (ssh) –compress (zlib) 95 MiB/s

(note that rates achieved for compressed transfers apply to this specific test file only, whereas uncompressed transfer rates would be the same for any synced file)

Based on this very limited testing, what can I conclude?

dlucredativ commented 10 months ago

There may be other reasons (compression? fake-super?) to keep the copyprog feature, but performance is not one of them.

A generic copyprog in the sense of https://github.com/bcpierce00/unison/issues/871#issuecomment-1842980288 would allow for an arbitrary data plane. Payload could be transferred over a different link or even by machines other that the unison-endpoints.

tleedjarv commented 10 months ago

I think copyprog already works as you suggested, with the tiny caveat that dst-uri is formatted as expected by rsync-over-ssh.

But why would you want to have this arbitrary data plane?

gdt commented 10 months ago

unison expects to either reach both roots through the filesystem, or use a stream connection to a remote unison. How the stream happens can be quite variable, and as long as it works and has reasonable throughput and latency, it should be ok. I don't see anything architecturally different from expecting a stream connection as having any likelihood of happening, given the history of contributions.

HaleTom commented 8 months ago

The main advantage of using rsync as I see it is not performance measured in MB per second, but rather the amount of data that isn't copied because it doesn't need to be.

This can be terrabytes -- rsync was designed to shunt huge amounts of data.

gdt commented 8 months ago

@HaleTom Please see my earlier comment:

Thanks. I didn't quite say this, but I am looking for actual evidence of usefulness, vs a theory that it might be useful, in a general enough case that it is worth the complexity. We have no test results on the table like "with this setup reproducible with this test script, regular unison sync takes X seconds but with copyprog rsync it takes Y".
tleedjarv commented 8 months ago

The main advantage of using rsync as I see it is not performance measured in MB per second, but rather the amount of data that isn't copied because it doesn't need to be.

This can be terrabytes -- rsync was designed to shunt huge amounts of data.

Do you have or know of any real-world or test data showing that Unison transfers more data compared to rsync (and how much more)?

gdt commented 6 months ago

It's been a year. The only data is from @tleedjarv which shows that while external rsync can be a bit faster in a ramdisk/on-machine scenario, it doesn't seem significant.

It does seem like someone should add lz4/zstd to ssh, and/or make ssh use multiple cores for compression - but that's out of scope for unison.

tleedjarv commented 6 months ago

I have so far not seen any practical evidence backing the claims that have been repeatedly made in support of copyprog. We simply don't know whether there is any truth in those claims or they lean more towards the urban legend category. I did some additional tests to shed some more light on this. (First set of tests in https://github.com/bcpierce00/unison/issues/871#issuecomment-1856055324)

This time I compared the number of bytes transferred. Tests were run with a single 4 GiB file. I originally started testing syncing in both directions but quickly discovered that the number of transferred bytes is very similar, so I continued testing with sync in only one direction. This also means that you should not assign any meaning to "received" and "sent" labels in the table below; they could just as well be reversed. rsync was run without --compress

File modification Transfer Comment
append 1 byte unison total 813 963 bytes
    sent 655 492 bytes
    received 158 471 bytes
+additional protocol overhead:
    sent 367 bytes
    received 179 bytes
rsync total 786 622 bytes
    sent 524 363 bytes
    received 262 259 bytes
truncate from end by 1 byte unison total 813 958 bytes
    sent 655 492 bytes
    received 158 466 bytes
+additional protocol overhead:
    sent 367 bytes
    received 179 bytes
rsync total 786 623 bytes
    sent 524 371 bytes
    received 262 252 bytes
change 1 byte unison total 879 460 bytes
    sent 655 492 bytes
    received 223 968 bytes
+additional protocol overhead:
    sent 369 bytes
    received 179 bytes
rsync total 852 160 bytes
    sent 524 363 bytes
    received 327 797 bytes
prepend 1 byte unison total 813 962 bytes
    sent 655 492 bytes
    received 158 470 bytes
+additional protocol overhead:
    sent 367 bytes
    received 179 bytes
rsync total 2 195 792 524 bytes
    sent 524 363 bytes
    received 2 195 268 161 bytes
remove 1 byte from start of file unison total 879 407 bytes
    sent 655 492 bytes
    received 223 915 bytes
+additional protocol overhead:
    sent 367 bytes
    received 179 bytes
rsync total 2 884 252 bytes
    sent 524 363 bytes
    received 2 359 889 bytes

(I separated unison protocol overhead because there will be at least some additional unison protocol overhead for copyprog too.)

Interpreting the results, I can only conclude that not only is Unison not worse than rsync, it is actually much better. That, or I botched the testing somehow.