Jwink3101 / syncrclone

Python-based bi-direction sync tool for rclone
MIT License
150 stars 13 forks source link

Sync speed #7

Closed pglez82 closed 3 years ago

pglez82 commented 3 years ago

Hi,

Firstly, thanks for this project. I have been using it a few months and I am really happy with it. I have an issue though that I couldn't fix yet. I use it to make a backup from my local pc to onedrive. The bidirectional functionality is handy because sometimes I modify things in onedrive (with a tablet) and I want them to be copied to the pc when I do the backup.

Everything works great but the process is very slow (I have 34k files and 58gb). The thing is that copying the files from A to B and from B to A is fast, but then, the program makes a second file refresh, and the one in B (onedrive), takes a long time (like 20 minutes). What I don't understand is why the first file refresh at the beginning of the process is fast (it takes 2 minutes more or less), and this second one is so slow. Is there anyway to configure the program to speed this up?

Thanks, Pablo

Jwink3101 commented 3 years ago

Hi.

I am glad the program is useful to you. I also use it with OneDrive (115k files, 344.72 gb) so I can probably be more helpful in tuning.

Your question

a second file refresh, and the one in B (onedrive), takes a long time

This is almost certainly OneDrive throttling. There is no reason one is fast and the other is not from the syncrclone perspective. It is the identical call.

OneDrive is annoying about this and I've noticed it too. It doesn't have listr functionality (to use --fast-list) so rclone has to iterate over every directory.

Some things I did to speed it up:

Future Enhancements

At the end of the sync, you need to know what the remotes look like hence the second file listing. I already optimize it to not re-list if nothing changed on one side but any change will cause a full re-list.

The thing is, I theoretically could eliminate this by using the initial listings and propagate the changes from one to the other. I am not opposed to some-day considering that but it introduces a lot of edge cases (including ones I don't even know how to test).

For example, if you use hashes for move tracking and but not sync, you now could not have a compatible hash anymore and it'll be really hard to fix it. And if you reuse_hashes they won't ever get fixed (e.g. SFTP remote to Dropbox). Or if you use inodes (local only and I may deprecate that anyway), then you don't have them. Finally, I've had bugs with OneDrive where the file listings is wrong. This may actually fix that but I'd rather know something is fishy.

None of these are insurmountable but I worry about adding so many edge cases. And again, that will take a lot of time. I had some a few weekends ago and added the faster actions (OneDrive is super slow with some of that) but again, that doesn't help the re-list.

Future Alternatives

Rclone itself is going to come with a bi-directional sync command in the next major release. I am curious if they figure out how to speed it up or not. I will say though, I like my algorithm more (of course, I am biased). I think it is more robust and has fewer edge cases. Essentially, theirs uses the old file listings to propagate changes whereas mine using the previous state only to resolve conflicts. It is much simpler and resilient to "bad" (or missing) previous list without accidentally deleting everything, But alas, I do not know golang, nor have the time to learn and implement it. So they are going forward with the other algorithm.

I also think that since bi-directional sync is inherently stateful (while mirroring is not), you need to be careful about how you track history and ensure you sync the whole directory every time. Hence my choice in a config-file approach vs a full CLI.

I am curious how they will handle cases like your original sync being A: to B: but then later doing A:subdir to B:subdir. I guess we will see.


I hope this helps.

pglez82 commented 3 years ago

Thank you so much. After updating to rclone 1.56 and using your flags I have solved the problem. The whole process takes now a couple of minutes (plus the file transfers that may be needed). Thank you!

Jwink3101 commented 2 years ago

Hi. I meant to update this a while ago.

I ended up doing that future enhancement. If you update syncrclone, update the config (see tips), then set

avoid_relist = True

and it will avoid the second listing.

There are some minor caveats but it works great for the most part.

pglez82 commented 2 years ago

That looks great!