backup / backup-features

5 stars 1 forks source link

Add Dropbox Syncer #13

Open marclennox opened 9 years ago

marclennox commented 9 years ago

It would be great to have Dropbox (and Google Drive) cloud syncers in addition to S3 and CloudFiles. I think the Dropbox API would support this.

tombruijn commented 9 years ago

It's certainly an option to add this. However, the core team is not currently working expanding the features of v4 to work on v5 instead. So we're not adding this in v4, but as v5 rolls around we take a look at this repo and see what can be done. During that time other volunteers are free to pick this up.

tombruijn commented 9 years ago

@marclennox it might also be an idea to split this issue into two issues, one for dropbox and one for google drive

marclennox commented 9 years ago

Done

marclennox commented 9 years ago

I've started working on this along with abstracting a lot of the content in Storage::Dropbox into a shared CloudIO::Dropbox class.

For some reason the chunked uploader just hangs indefinitely trying to upload a file from my dev environment to dropbox, but I'll keep playing with it.

marclennox commented 9 years ago

Hey @tombruijn I have this mostly working, but it's intrusive enough that I'd like to get some comment on my approach.

At a high level, I took a lot of the code out of Storage::Dropbox and put it in a new CloudIO::Dropbox object (similar to others). This new object is used by Storage::Dropbox and by a new Syncer::Cloud::Dropbox.

One of the challenges with doing sync with Dropbox is that they do not return a hash of the file contents, instead returning their own revision ID for each unique revision of a file. So in order to determine if a file needs to be synced, I compare the remote file's revision ID with a locally cached revision ID, which I store when I first upload the file (Dropbox returns it to you on upload).

I created a new cache file for this purpose (in the same cache directory as the session cache) which is a YAML array of file path keys and metadata values. I also augment the dropbox metadata with the local file's MD5 hash, which I also use to check for local file changes.

For now I've overloaded the sync_file method in the Dropbox CloudIO, because as I mentioned I am checking for a local file MD5 hash change, as well as a remote file revision difference to determine whether to upload. I'm sure this can be abstracted in a more generic fashion later.

Lastly, I noticed when mirroring, local file deletes work correctly, however if a local directory is deleted, only the file contents are deleted remotely and the directory remains. I'm not sure whether this is a problem in my Dropbox implementation or if it's a problem with the Cloud Sync base class. Does this work properly with S3 and CloudFiles?

Thanks.

marclennox commented 9 years ago

@tombruijn once I've got your stamp of approval on my approach, I think I will break this up into 2 PRs. The first one would be the abstraction of dropbox functionality into CloudIO, which would make sure that all existing tests run properly. Then the second PR would add the sync functionality.

tombruijn commented 9 years ago

@marclennox I haven't been maintaining Backup for that long so I'm still in the dark about a lot of classes. I don't know how the CloudIO S3 and Cloudfiles work exactly.

I would suggest you try and make the best possible solution that fits in the current codebase without changing too much of the existing setup (limiting what could break). And about splitting the PR into two PRs, I'm in favor of that. It will allow me to review them more easily.

marclennox commented 9 years ago

OK I will break it into 2.  I think that breaking out the Dropbox core stuff into a CloudIO makes very good sense from my understanding of the code base.

— Sent from Mailbox

On Sun, Oct 5, 2014 at 9:47 AM, Tom de Bruijn notifications@github.com wrote:

@marclennox I haven't been maintaining Backup for that long so I'm still in the dark about a lot of classes. I don't know how the CloudIO S3 and Cloudfiles work exactly.

I would suggest you try and make the best possible solution that fits in the current codebase without changing too much of the existing setup (limiting what could break). And about splitting the PR into two PRs, I'm in favor of that. It will allow me to review them more easily.

Reply to this email directly or view it on GitHub: https://github.com/meskyanichi/backup-features/issues/13#issuecomment-57936534

marclennox commented 9 years ago

Probably won’t get this done for another week.   Dropbox has some idiosyncrasies that make this a little more challenging, but I’m confident it’s going to work well.

— Sent from Mailbox

On Sun, Oct 5, 2014 at 11:53 AM, Marc @ Gmail marc.lennox@gmail.com wrote:

OK I will break it into 2.  I think that breaking out the Dropbox core stuff into a CloudIO makes very good sense from my understanding of the code base. — Sent from Mailbox On Sun, Oct 5, 2014 at 9:47 AM, Tom de Bruijn notifications@github.com wrote:

@marclennox I haven't been maintaining Backup for that long so I'm still in the dark about a lot of classes. I don't know how the CloudIO S3 and Cloudfiles work exactly.

I would suggest you try and make the best possible solution that fits in the current codebase without changing too much of the existing setup (limiting what could break). And about splitting the PR into two PRs, I'm in favor of that. It will allow me to review them more easily.

Reply to this email directly or view it on GitHub: https://github.com/meskyanichi/backup-features/issues/13#issuecomment-57936534