Rambalac / ACDDokanNet

Dokan.NET based driver for Amazon Cloud Drive
MIT License
185 stars 13 forks source link

Upload cache leads to inconsistent view of remote, crashing backup apps #19

Open sirotnikov opened 8 years ago

sirotnikov commented 8 years ago

Due to current upload implementation there is an inconsistency in the file-system abstraction.

When a large file (or a large amount of files) is written by a windows application, the service reports FileWrite operations as fulfilled, as soon as they are cached and before files are finished uploading. However, the folder listing does not report the files as present on the remote directory (because they still aren't there)

This leads to major failure of almost all major backup programs, which attempt to verify the file on the destination, or perform post copy operations (such as renaming, for transaction-safe updates, or setting file date, for correct date-time comparison).

I think there are two possible ways to solve this:

  1. An option to delay reported file operation as complete, until it is actually uploaded. (this would make working with ACD as working with slow networks / old USB drives)
  2. An option to 'shim' queued files to the OS, to allow post copy operations (renames, moves, set properties etc). This would require a complex cache logic, some form of 'shimmed file' overly (akin to dropbox) as well as tracking and consolidating post upload file operations in the cache.

I think option (1) is the safe and sane option. It will help better reflect upload progress, and importantly - it will prevent backup operations from flooding the upload cache.

I realize this might not be the preferred option for other scenarios, so I think it should be optional.

What do you think?

Rambalac commented 8 years ago

(1) is impossible because OS file writing is organized with random access but ACD is stream only. And there is no way to understand if file is being copied or written contiguously, driver just receives requests to write memory buffer to specified position in the file. Also ACD API does not allow partial file update and it can be only reuploaded whole.

For now i'm implementing something like (2) but with additional features for big files like partial file cache to disk and prefetch. I hope It will also allow to open video files directly.

sirotnikov commented 8 years ago

So what you're saying about (1) means that there are actually no file metadata (rename/set date) operations - you download the file, change it on a local copy and then reupload the file?

What happens when I rename a file? You must download it, delete the original, and then upload with a different name?

Rambalac commented 8 years ago

There is renaming (move), but there is no way to upload slowly directly because driver cannot be sure if writing is linear or nothing will be modified while writing.

sirotnikov commented 8 years ago

Hmmm.

If I understand the Dokan API correctly, what you're getting for each file is:

CreateFile(filename, .... ); //called on HandleOpen(...)
WriteFile(filename, .... ) // x unknown times (depends on client program implementation)
Cleanup(filename); //called on HandleClose(...)

I realize why you can't start uploading on WriteFile().

Would it be sane to 'block' the Cleanup() API call until the file is uploaded? (since you're basically, guaranteed the program is 'done' writing to the file, for now). This could, in theory, stall the source application code until you're done.

I'm not versed in windows file system standards to know if this is sane behavior or will cause the OS to crash the Dokan driver / user app.

Rambalac commented 8 years ago

If I block on any operation too much it will fail, and during such block even Windows Explorer can hang. For now I set timeout to 30 seconds.

sirotnikov commented 8 years ago

Ah, so shimming indeed seems like the only option.

I appreciate the trouble you're going through to implement a file system cache. It's immensely useful. I'd suggest helping out, but my current semester is packed as it is.

As an aside, do you have plans for avoiding choking the user system in case of massive file operations?

Suppose I choose to copy my entire external drive to the ACD virtual drive, and I don't want ACDDOkan to eat up my entire C drive (or fail, if not enough room).

Perhaps when there are uploads > MAX_UPLOADS || cache_size > MAX_CACHE_SIZE, you could introduce timeouts into WriteFile() calls, to artificially slow down the user application, while you're handling pending operations.

Thanks for the hard work and the prompt responsiveness.

TomasJanirek commented 8 years ago

Hi, I want to upload about 16TB (multiple files of course) of my backup:) It seems you copy all files to 'Upload' folder first. It is very inconvenient when I want to copy everything at once (it basically create backup of those 16TB). Option to remove upload folder would be great (or just create links to copied files and folders). Or just use VSS to create snapshot of files which might get modified during transfer (not my case). Or at least is there a way I can create symbolic link of my backup folder to 'Upload' directory and it will upload in background without making copy of the files? Another non-related question: do you use just one upload stream or multiple streams as official application do (it least they say that:) Or is it possible to run multiple instances of ACD to create multiple streams (of course upload will be to different remote folders) or is there some limitation of dokan to run only one instance? Thanks.

Rambalac commented 7 years ago

I'm thinking about symbolic link, but except that as virtual drive I have no way to get location of copied files, only their content. For me all files are just created by Explorer for example. I do use multiple threads, but only 2. Native Amazon app has more possibilities than apps using public API. I dont want to risk with my app get throttled as some other apps got.

TomasJanirek commented 7 years ago

I was able to compile and run the project, but I was unable to make it run yet and I spent only few minutes on that so I didn't fiddle with it much. I don't know how dokan works, if you get only the location of already copied file, then it is done, one would need to fiddle with dokan. I see two options. Either edit manually the config file and files in temp with files which are about to be uploaded. Or maybe more convenient (for me:) some commandline interface of amazon cloud. Something simple like cd, dir, upload file, download file. Then I can build my backup on top of that. I just did a quick search and found drivesink.py and REST API from amazon, but it is invitation only. Do you have any experience with some commandline (or just some simple api) tool to automate things? Now I see that ACDDokanNet is a great for browsing using total commander, but it is quite a big gun for what I actually need for simple automated backup:o) I'm currently little over 10TB with amazon tool (it don't follow junctions so I need to add some files from time to time) and it is still uploading (~100 mbit). Wondering when my ISP or "unlimited" Amazon start complaining:o)

Rambalac commented 7 years ago

I have .net library for Amazon drive. But the main problem is getting Client secret from Amazon. Yes, now it's by invitation only

TomasJanirek commented 7 years ago

Do you mean other library than in this project? Do you share it somewhere?:) I've obtained clientid and secret key from amazon quite easily (I've created dev account but it might not be even necessary). Regarding REST, I guess they don't want very userfriendly backup solution or file share for their unlimited cloud drive for just 60USD per year. Of course they have webdav etc. for their S3, but the price is ..... uff:)

Rambalac commented 7 years ago

It's here https://github.com/Rambalac/AmazonCloudDriveApi If you obtained them before August then yes, it was easy. Now it's not, as I understand.

TomasJanirek commented 7 years ago

I did it about a week ago. I just checked and it's through Developer console (https://developer.amazon.com/iba-sp/overview.html) -> APPS & SERVICES -> Security Profiles -> Create a New Security Profile I don't know whether I needed to register somehow to developer access. Now I'm just using my standard amazon credentials to get to developer console and create security profiles.