derrix060 / onedriveClient

A Microsoft OneDrive and OneDrive for Business client for Linux, written in Python3.
MIT License
65 stars 10 forks source link

Unable to compare existing files downloaded via other methods. As result, onedriveClient is downloading/uploading everything again #22

Open modelmat opened 5 years ago

modelmat commented 5 years ago

If a directory is used which had been previously synced, this program will upload each file again as file {desktop}, which wastes my bandwith.

Can the file hashes be compared first?

derrix060 commented 5 years ago

What do you mean when you say "a directory is used"? Are the files going to change the timestamp? Is there any file that has its hash changed?

modelmat commented 5 years ago

Ie. If I have previously copied my OneDrive directory (using another sync tool, for example), then syncing will reupload every single file with the added (desktop) suffix to the end. Suppose this would be fixed with #17 though.

I meaning don't reupload files with (desktop) suffix if the file on cloud has the same hash, if the local one is newer overwrite it, if the cloud is newer download the cloud.

derrix060 commented 5 years ago

The way that is doing now to know if a file is different is first looking at the path (and the parent repository) + the filename. If matches with the cloud, then check the timestamp + hash.

I'm changing a little bit this behaviour on #17...

Let me see if I understood what you are saying:

am I right?

modelmat commented 5 years ago

Yes.

derrix060 commented 5 years ago

What I think that is weird in this case is when you set to sync, it should download everything again...

I remember that I had the same issue when I first started to look at this project, what I did is gave up and let the onedrive download everything...

Can you make sure that the files are in the same structure and that the framework is uploading the file, not only the timestamp?

modelmat commented 5 years ago

I actually tried it again and it seemed not to be, but I have just decided to redownload everything from scratch (deleted with rm :P) so I can't test til it syncs again.

derrix060 commented 5 years ago

Investigating #21 I've found why the framework was uploading duplicates.

There are a couple of issues, I will try to explain the steps to check if the item is the same:

Check if the item exists locally

Check if the item has changed:

Issues:

One possible way to do is to download the file, calculate the hash and see if it maches, or (how is now), upload the file with a different name. I will think more about how to know if the file is the same or not, and figure out the best way.

modelmat commented 5 years ago

I assume you meant #22 not 21.

Especially for this issue, if all the files will be downloaded or uploaded as dupes, as long as the time is pretty close it can be assumed to be the same - if there is a substantial difference maybe it should be uploaded (though this should definitely be given to the user) as an option).

derrix060 commented 5 years ago

No I mean 21 haha. I was debugging that error and found this...

Usually, the download speed is higher than upload, so I will download the file (hope that the file is not large...) and compare the hash. If the hashes are different, I will keep both locally and on the cloud, letting the user decide which one is up-to-date.

abraunegg commented 5 years ago

@derrix060 You will also run into this issue: https://github.com/OneDrive/onedrive-api-docs/issues/935

The timestamp can be slitly different (some seconds)

Baseline all 'timestamps' (local and OneDrive) to drop fraction seconds - HH:MM:SS is what should be compared otherwise timestamps will always be an issue

derrix060 commented 5 years ago

@abraunegg thank's for the information! I'm planning to do a very bad workaround: download the file, calculate the checksum manually, and see it matches...

BTW, you have a nice project, congrats!!

modelmat commented 5 years ago

Maybe it should only download if the timestamp is within 5 minutes or so? This allows for timestamps to be slightly off on onedrive's end and even on the client's end due to clock drift

derrix060 commented 5 years ago

Why 5min? Is possible to change a file on the remote, and before 5min change locally as well... It would cover some cases, but not all...

modelmat commented 5 years ago

I was thinking that 5 minutes would be a reasonable time. Even 10 seconds or so would probably be enough - what I am trying to say is there is no point downloading if the timestamp was say, 2 years apart - there is no point downloading then.