Berimor66 / duplicati

Automatically exported from code.google.com/p/duplicati
0 stars 0 forks source link

Amazon Glacier support #689

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
It would be great to support Amazon's new Glacier service, which is designed 
for long term cloud storage at only $0.01 per GB. http://aws.amazon.com/glacier/

Original issue reported on code.google.com by d...@simplycharlottemason.com on 21 Aug 2012 at 1:29

GoogleCodeExporter commented 9 years ago
I'd definitely like to see this too. This may require architectural changes 
though (due to the waiting period to retrieve data for example and potentially 
with listing of existing files too), but I hope it'd be doable. 

Original comment by ultraman...@gmail.com on 23 Aug 2012 at 7:56

GoogleCodeExporter commented 9 years ago
The other big challenge is optimal bandwidth throttling. Glacier charges by
the peak GB/hr usage, so the download should be throttled.

Original comment by scottjdu...@gmail.com on 23 Aug 2012 at 8:27

GoogleCodeExporter commented 9 years ago
Support for this would be great.  Would make large volume backups very 
inexpensive.

Original comment by mfsam...@gmail.com on 26 Aug 2012 at 7:22

GoogleCodeExporter commented 9 years ago
Wow, 32 stars in 5 days!

I had a look at the API, and it is not too hard to use, but the delay is a 
problem.

The main problem for making the backups is the manifest files. They describe 
the content and includes some checks so Duplicati can be reasonably certain 
that the backup is intact. With each backup, Duplicati will download all the 
manifests and use them to verify that the backup chain is intact and correct. 
This cannot be done with Glacier, because you need to request the manifest(s) 
and then wait 4 hours before you can download them, and you will be charged for 
this.

This same problem applies to the signature files, but Duplicati has a cache for 
this, so they are not requested during normal operation (the CLI needs to have 
a cache folder passed as an option).

The only way I can see this working is by creating a hybrid backend, that uses 
S3 and Glacier at the same time and presents a simplified view. That is, the 
manifest files are store in an S3 bucket for fast access, everything else is 
stored in Glacier. When Duplicati requests a list of files, the backend will 
list the bucket with manifests and the Glacier vault and return a combined 
list. When a download is requested, the backend needs to see if it is in S3 or 
Glacier. Same goes for uploads, all .manifest files must be directed to S3, 
everything else must be sent to Glacier. Similar for delete.

To tackle the 4 hour wait issue, something needs to change. For the CLI I 
propose adding a new command called "prepare-restore" to Duplicati. To do a 
restore from a Glacier based store, you can run "prepare-restore" first. This 
will give you a ticket number you can later pass to "restore". If you simply 
call "restore" without a ticket, it should create the ticket behind the scenes, 
and wait until it can get the data. In this simple case you will not get a 
ticket id, and thus you will need to re-issue the request if you restart the 
restore.

The GUI just needs a status callback, so it can display a "waiting for data" 
type icon until data is ready. It would probably call "prepare-restore" first, 
to get the ticket, and then call restore with the ticket. IF you stop the 
restore, it can resume on the same ticket later, thus not blocking any other 
tasks that should run.

Original comment by kenneth@hexad.dk on 27 Aug 2012 at 10:13

GoogleCodeExporter commented 9 years ago
Maybe you can add cache support for manifest files too? So new backup will wait 
4 hours only after (re)install of duplicati. I think in this case we can avoid 
S3 usage. Thanx.

Original comment by mega...@gmail.com on 27 Aug 2012 at 10:37

GoogleCodeExporter commented 9 years ago
+1 for a hybrid backend. I'd additionally store the manifests in the glacier 
vault in order to have a single reliable storage that even works in case the 
manifests are gone in the S3 storage.

Original comment by tvanles...@gmail.com on 27 Aug 2012 at 2:20

GoogleCodeExporter commented 9 years ago
Would the hybrid have to be with S3? Wouldn't it be best if the manifests could 
be stored on any other service (ie: just locally or in Dropbox or Google Drive) 
too? Not sure if that's a lot more work to implement nicely, but it'd be a nice 
feature to have surely. 

Original comment by ultraman...@gmail.com on 27 Aug 2012 at 2:25

GoogleCodeExporter commented 9 years ago
I agree with some of the other comments, why not just store this locally and 
only pull the file in the event the manifest is not present, such as in a 
(re)install.

Generally though hybrid storage sounds good to me, and if that is technically 
more feasible than most users should be able to handle some additional 
configuration.

Original comment by mfsam...@gmail.com on 27 Aug 2012 at 2:55

GoogleCodeExporter commented 9 years ago
The vault should contain the manifests too. It is possible to restore without 
the manifests, but having them in the vault makes it a lot more robust to do 
this automatically.

In theory, the manifest files can be stored anywhere, including locally.
I prefer that they are stored somewhere remote so there are no weird issues if 
two clients are accessing the same data.
The problem with an external location is the needed extra setup (url, username, 
password, + special stuff) required to specify the other location, which is why 
I proposed using S3, as the credentials/region/etc is already specified.

I had an idea for storing a value in the description field for each file, as 
that can be retrieved without accessing the data. But I just discovered that 
listing the vault contents (a vault inventory) is also a 4-hour operation.

Based on the comments here, I think storing the manifests locally seems like 
the right thing to do, even if the local storage can become out-of-sync with 
the remote one.

Original comment by kenneth@hexad.dk on 28 Aug 2012 at 10:40

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
Why not allow the user to set a secondary store for the manifests?

That way the backups (along with manifests) can be stored on Glaicer, but the 
user can specify keeping a copy on the manifests on localdisk, a NAS, s3, etc.

Original comment by brad...@google.com on 31 Aug 2012 at 4:50

GoogleCodeExporter commented 9 years ago
There is going to be some kind of AWS S3-to-Glacier API that might automate 
backup movement to the cold storage after getting it (and checking everything) 
at S3. Maybe wait until that happens to see the best approach.

Original comment by j...@jeffmcneill.com on 4 Sep 2012 at 11:48

GoogleCodeExporter commented 9 years ago
The way I'd like to see Glacier brought into this product would be only as long 
term storage. My primary backup would be to something local and then once a 
week or once a month, one full backup gets copied to Glacier. My manifests are 
still all local along with most of my backups, but now I have a copy stored 
remotely as well.

Original comment by thomg...@gmail.com on 5 Sep 2012 at 1:23

GoogleCodeExporter commented 9 years ago
I still believe it would be powerful to have two classes of storage: readily 
available and not as readily available (glacier). Long term or short term is 
less important than availability. Not as readily available is for things like 
disaster recovery (e.g., the house or business burns to the ground, long term 
off-site storage, etc.). 

Original comment by j...@jeffmcneill.com on 5 Sep 2012 at 1:28

GoogleCodeExporter commented 9 years ago
I was curious if there was an updated on the feasibility of this?  

If it cannot be added due to technical limitations I would like to investigate 
alternatives for my organization.  Thank you for all your time on this.

Original comment by mfsam...@gmail.com on 4 Oct 2012 at 1:46

GoogleCodeExporter commented 9 years ago
It is indeed feasible, someone just needs to do it :).

Sadly I have been more busy than usual, and when I get some time I prefer to 
work on the new UI.
If someone is up for the task, drop me a mail and I will assist where I can.

IMO, you should always investigate alternatives, and it does not look like this 
will be added in the near future.

Original comment by kenneth@hexad.dk on 4 Oct 2012 at 2:31

GoogleCodeExporter commented 9 years ago
I hope this will work in next update. 

Original comment by kerane...@gmail.com on 9 Oct 2012 at 10:01

GoogleCodeExporter commented 9 years ago
That would be soooooo great!

Original comment by arjenvin...@gmail.com on 11 Oct 2012 at 6:08

GoogleCodeExporter commented 9 years ago
In the meantime I'm having play with http://fastglacier.com/

Original comment by r...@thegerrings.com on 6 Nov 2012 at 5:54

GoogleCodeExporter commented 9 years ago
Any change in plans for this feature due to the recent announcement that AWS 
natively supports migrating data from S3 to Glacier?
http://aws.typepad.com/aws/2012/11/archive-s3-to-glacier.html

Original comment by mxx...@gmail.com on 24 Dec 2012 at 9:51

GoogleCodeExporter commented 9 years ago
We have discussed this and we still think that native support of Glacier is 
much better than copying files from S3 to Glacier. The reason is that Duplicati 
has some rules when to delete old backups and these rules don't work with an 
S3-to-Glacier solution.

Original comment by rst...@gmail.com on 29 Dec 2012 at 9:30

GoogleCodeExporter commented 9 years ago
Yes, native support would be great, but based on Kenneth's previous comments it 
sounded like an unlikely event, so I thought that maybe this S3-Glacier bridge 
would lighten up things..

If those rules are based on age, then (as can be seen even from screenshots in 
the linked article) AWS can automatically expire objects based on their age...?

Original comment by mxx...@gmail.com on 29 Dec 2012 at 9:46

GoogleCodeExporter commented 9 years ago
There are two rules: Age and number of backups. Age can be handled but it needs 
to be manually handled in Duplicati and Glacier. Number of backups could be 
handled if you turn "number of backups" into "age" somehow. However, both 
approaches are error prone.

Kenneth would like to see Glacier support, too. But we have started the work on 
the new UI already and we want to get this done before we start new things.

Original comment by rst...@gmail.com on 29 Dec 2012 at 10:13

GoogleCodeExporter commented 9 years ago
Hi, 
Ever since I heard about Amazon Glacier I have been searching for a nice 
backup-to-Glacier solution for my NAS. I just discovered Duplicati and think it 
would be awesome to get it to work natively with Glacier but it sounds like 
that is not going to happen anytime soon. 

I found a workaround on a blog post on how to migrate Duplicati backups to from 
S3 to Glacier while keeping the manifest files on S3:
http://blog.epsilontik.de/?page_id=68

His workaround, while functional, seems a bit complicated. I was thinking that 
it would be easier just to rename the manifest files directly in the Duplicati 
source code so that the Amazon prefix filter can be used. I was just going to 
look into this myself, but before I waste a bunch of time I was wondering if 
anyone with more insight can let me know if this was possible and if so point 
me in the right direction. 

Thanks!

Original comment by fabasi....@gmail.com on 7 Mar 2013 at 10:31

GoogleCodeExporter commented 9 years ago
Epsilontik is posting about duplicity not duplicati.
I'm not sure but I'd assume that the technical issues facing duplicity
are not the same as the ones facing duplicati.

On Mar 7, 2013, at 5:31 PM, "duplicati@googlecode.com"
<duplicati@googlecode.com> wrote:

Original comment by chris.dr...@gmail.com on 7 Mar 2013 at 10:46

GoogleCodeExporter commented 9 years ago
You would have the same problem with Duplicati, as the filenames have the same 
prefix, and the manifest files must be read in each run.

You can rename the manifest files in the source, but it is slightly complicated 
to get the parsing logic to eat it.

I am working on a new block-based storage format, that does not rely on 
manifest files, and will thus work better with Glacier.

Sadly it does not look like Glacier support is magically appearing anytime soon.

Original comment by kenneth@hexad.dk on 8 Mar 2013 at 10:54

GoogleCodeExporter commented 9 years ago
This is NOT needed!!!  You can use rules on your S3 bucket to move data from S3 
to Glacier.  If people what this, setup an S3 bucket and then setup a rule to 
immediately move to Glacier.

Original comment by ma...@h5sw.com on 9 Jul 2013 at 9:48

GoogleCodeExporter commented 9 years ago
It IS needed for proper support. Yes you can move the backups into Glacier 
easily enough however (if I understand everything correctly) you then would not 
be able to manage the backups using duplicati (i.e. delete old backups being 
the main issue) 

Original comment by CraigaWi...@gmail.com on 9 Jul 2013 at 9:54

GoogleCodeExporter commented 9 years ago
From what I understand, your listing of files would be incomplete after moving 
the files to Glacier, which would certainly break Duplicati.

Also, as I understand, you cannot simply delete the S3 entry when it is stored 
in Glacier.

Original comment by kenneth@hexad.dk on 9 Jul 2013 at 10:19

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
Sadly I only see mac support and not windows or anything else while we are
at it...

Original comment by coolsai...@gmail.com on 9 Jul 2013 at 8:36

GoogleCodeExporter commented 9 years ago
Thanks for suggesting an alternative to duplicati that works on 1/3 of the 
platforms Duplicati works on. Can we keep this issue about Glacier support in 
duplicati instead of suggesting non-replacements?

Original comment by Asa.Ay...@gmail.com on 9 Jul 2013 at 8:37

GoogleCodeExporter commented 9 years ago
That is correct, ARQ is mac only software.  So yes, support for Glacier in 
Duplicati is very much needed.

Original comment by t...@edgerunner.org on 9 Jul 2013 at 8:40

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
I've been using Cloudberry now for a few months.
It is Windows and Glacier.

Original comment by ffcmer...@gmail.com on 9 Jul 2013 at 9:00

GoogleCodeExporter commented 9 years ago
Cloudberry is recommended on the ARQ homepage I think...

If you read the haystacks page carefully it gives solutions for different
OS's

P

On Tuesday, July 9, 2013, wrote:

Original comment by peter.da...@gmail.com on 9 Jul 2013 at 9:13

GoogleCodeExporter commented 9 years ago
I too gave up waiting for duplicati to support glacier and now use Cloudberry. 
It works well but is not free or open source.

Original comment by ian.cumm...@gmail.com on 9 Jul 2013 at 9:16

GoogleCodeExporter commented 9 years ago
And it's not compatible with duplicati. Lets not Include non-open
source software in this discussion

Original comment by chris.dr...@gmail.com on 9 Jul 2013 at 10:40

GoogleCodeExporter commented 9 years ago
Also I'd prefer to see duplicati expand its capabilities to fully
support Glacier

Original comment by chris.dr...@gmail.com on 9 Jul 2013 at 10:41

GoogleCodeExporter commented 9 years ago
@39 just saying that I would much prefer duplicati to offer glacier support, 
but if it won't, then people will move away and I am one example. Whether that 
matters to anyone is a moot point.

Original comment by ian.cumm...@gmail.com on 10 Jul 2013 at 5:26

GoogleCodeExporter commented 9 years ago
I am working on a new file format that will make Glacier support much better. 
It has a local database so there is no need to rely on file listings at all. 
Initially it will only work with the S3+Glacier method (i.e. set a rule in S3). 

I will not work on the actual Glacier implementation until after the 2.0 UI is 
finished. But this is an open source project, so anyone is free to develop it 
and I will gladly grant commit access. 

If anyone wants to give it a go, there is a guide for the Duplicati part here:
https://code.google.com/p/duplicati/wiki/CustomBackendHowTo

Original comment by kenneth@hexad.dk on 11 Jul 2013 at 12:07

GoogleCodeExporter commented 9 years ago
It would be great to get the same kind of integration as git-annexhas for 
access to S3 as well as/separate from Glacier. See for example the screencast 
on Git Annex Assistant http://git-annex.branchable.com/assistant/

Original comment by j...@jeffmcneill.com on 12 Jul 2013 at 6:47

GoogleCodeExporter commented 9 years ago
Hi, this would be a really useful development. 
One thing to think about while you're designing this is to allow support for 
the AWS Import/Export service where you send Amazon a hard drive with your data 
(the first backup). Otherwise, it might take months to upload. 

Original comment by b.grych...@gmail.com on 5 Sep 2013 at 9:08

GoogleCodeExporter commented 9 years ago
I just published a short howto that explain how backups can be made to Glacier 
using S3-to-Glacier and the new storage engine of Duplicati 2.0. I think with 
the new storage engine we got the basics covered now for a Glacier connector. 
http://www.duplicati.com/news/howtouseglaciertostorebackups

Original comment by rst...@gmail.com on 26 Sep 2013 at 5:38

GoogleCodeExporter commented 9 years ago
Thanks a lot! I will definitely take a look. 
Have you given any thought to the AWS Import/Export ability I mentioned above?
In particular, would it be possible to set up Duplicati to use a local disk, 
then send the disk to Amazon, who would put its contents on S3, and then 
re-configure Duplicati to use S3 (and the files present there already)?

Original comment by b.grych...@gmail.com on 6 Oct 2013 at 6:05

GoogleCodeExporter commented 9 years ago
That should work. You just have to point Duplicati to a specific local database 
file. The setting for it is --dbpath="". The reason is that Duplicati uses a 
local database to speed things up and make Glacier support possible. It creates 
this file automatically based on storage, username, path. If the storage 
changes, the file changes, too. You can avoid that by specifying the right file.

Original comment by rst...@gmail.com on 6 Oct 2013 at 7:15

GoogleCodeExporter commented 9 years ago
Hi, 

I'm testing Duplicati2 on Win7 with S3 Glacier.  The backup works great, thanks!

But, looking at:
  http://www.duplicati.com/news/howtouseglaciertostorebackups
it says to:
  Set up a prefix-filter that moves files from S3 to Glacier regularly. The prefix-filter should look for all files that match “duplicati-d*”. 
but when I look in my S3 store, I don't see any files that match that glob.  I 
see:
  duplicati-<date>.dlist.zip.aes
  duplicati-<rand>.dblock.zip.aes
  duplicati-<rand>.dindex.zip.aes
It hasn't been long enough yet for the glacier to kick in, but I just wanted to 
double-check that glob. 

Thanks, 
Pete

PS:  Here's the procedure I used, for the record.

In AWS Management console, S3, set up a new prefix (subdirectory) in my AWS S3 
bucket for this user on this computer.

At the root of the bucket, click Properties, Lifecyle, Add Rule, check Enabled, 
Name: duplicati-glacier, uncheck Apply to Entire Bucket, Prefix: duplicati-d* 
(this is the "prefix-filter" mentioned in the duplicati glacier webpage), Time 
Period: Days from creation date, Action: Move to glacier: 1 : days from 
object's creation date.
Save.  

Navigate to the AWS IAM console, set up a new AWS IAM user for this bucket 
prefix (for this user on this computer).

Download, save, and protect the IAM user AWS Access credentials.

In the IAM console, click on Users, the new user, Permissions, Attach user 
policy, Custom Policy, Select, Policy Name: "pol<Computer-User>", Policy 
Document (edit, paste in the following):
{
   "Statement":[
      {
         "Sid":"AllowListBucketIfSpecificPrefixIsIncludedInRequest",
         "Action":["s3:ListBucket"],
         "Effect":"Allow",
         "Resource":["arn:aws:s3:::<bucketid>-duplicati"],
         "Condition":{
            "StringLike":{"s3:prefix":["<computer-user>/*"]
            }
         }
      },
      {
        "Sid":"AllowUserToReadWriteObjectDataInDevelopmentFolder", 
        "Action":["s3:GetObject", "s3:PutObject"],
        "Effect":"Allow",
        "Resource":["arn:aws:s3:::<bucketid>-duplicati/<computer-user>/*"]
      },
      {
         "Sid": "ExplicitlyDenyAnyRequestsForAllOtherFoldersExceptDevelopment",
         "Action": ["s3:ListBucket"],
         "Effect": "Deny",
         "Resource": ["arn:aws:s3:::<bucketid>-duplicati"],
         "Condition":{  "StringNotLike": {"s3:prefix":["<computer-user>/*"] },
                        "Null"         : {"s3:prefix":false }
          }
      }
   ]
}

Click Apply Policy.

Back at your computer.
Download duplicati2 from:
  https://code.google.com/p/duplicati/downloads/detail?name=2.0_CLI_experimental_2013-09-13.zip&can=2&q=

Unzip that.  Open a Windows command/cmd window, and cd to that new directory 
with the executables and .dlls.

Note:  in the duplicati command line, --option=value pairs must have an = sign, 
and must not have any spaces around the = sign. 

C:\Users\<user>\Downloads\2.0_CLI_experimental_2013-11-20\>
Duplicati.CommandLine.exe backup 
s3://<bucketname>-duplicati/<computer-username> "C:\Users\<user>\Documents" 
--s3-server-name=s3.amazonaws.com --s3-use-rrs=true  
--s3-location-constraint=us-east-1 --use-ssl  --aws_access_key_id=<kind of 
secret I think> --aws_secret_access_key=<top secret key>  --passphrase=<long 
and strong>

Output:
Backup started at 1/30/2014 1:05:35 PM
Uploading filelist from previous interrupted backup
Checking remote backup ...
  Uploading file (4.58 KB) ...
  Listing remote folder ...
removing file listed as Temporary: duplicati-20140130T180450Z.dlist.zip.aes
removing file listed as Uploading: 
duplicati-b1d64d11a070e4decb8d4034e5efe83ff.dblock.zip.aes
removing file listed as Uploading: 
duplicati-i937b552485714fa48243f3c3eadfc18f.dindex.zip.aes
removing file listed as Uploading: 
duplicati-bcef0970503db43ce958d76ec3e03b666.dblock.zip.aes
removing file listed as Uploading: 
duplicati-i0564ebf376594de5aa83132ef0db46c5.dindex.zip.aes
Scanning local files ...
  341 files need to be examined (737.51 MB)
  336 files need to be examined (737.51 MB)
  308 files need to be examined (659.37 MB)
  256 files need to be examined (556.15 MB)
  204 files need to be examined (452.51 MB)
  152 files need to be examined (348.52 MB)
  108 files need to be examined (247.75 MB)
  99 files need to be examined (170.24 MB)
  44 files need to be examined (75.96 MB)
  0 files need to be examined (0 bytes)
  Uploading file (18.63 MB) ...
  Uploading file (240.01 KB) ...
  Uploading file (10.47 KB) ...
Checking remote backup ...
  Listing remote folder ...
Verifying remote backup ...
  Downloading file (10.47 KB) ...
  Downloading file (240.01 KB) ...
  Downloading file (18.63 MB) ...
Remote backup verification completed
  Duration of backup: 00:11:34
  Remote files: 4
  Remote size: 18.88 MB
  Files added: 337
  Files deleted: 0
  Files changed: 0
  Data uploaded: 18.88 MB
  Data downloaded: 18.88 MB
Backup completed successfully!

Seemed to work fine.  It deleted some files from my previous attempt.  

Now run with glacier-specific duplicati options:  --no-backend-verification 
--no-auto-compact 

Duplicati.CommandLine.exe backup 
s3://<bucketname>-duplicati/<computer-username> "C:\Users\<user>\Documents" 
--s3-server-name=s3.amazonaws.com --s3-use-rrs=true  
--s3-location-constraint=us-east-1 --use-ssl  --aws_access_key_id=<kind of 
secret I think> --aws_secret_access_key=<top secret key>  --passphrase=<long 
and strong>  --no-backend-verification --no-auto-compact 
Backup started at 1/30/2014 2:11:19 PM
Scanning local files ...
  342 files need to be examined (737.53 MB)
  0 files need to be examined (0 bytes)
  Duration of backup: 00:00:02
  Files added: 0
  Files deleted: 0
  Files changed: 0
  Data uploaded: 0 bytes
  Data downloaded: 0 bytes
Backup completed successfully!

If I remember to, I'll update again after the files move to glacier.  

Original comment by pjala...@gigalock.com on 30 Jan 2014 at 7:16

GoogleCodeExporter commented 9 years ago
If you look at the <rand> part of you filenames you can see that the dblock 
files all start with "b" and the dindex files all start with "i", so the prefix 
filter should work correctly.

Original comment by kenneth@hexad.dk on 31 Jan 2014 at 12:59

GoogleCodeExporter commented 9 years ago
Hmm, but here are 3 of my S3 objects from yesterday: 
duplicati-b98b6b09770134370b226d39a83ce40dc.dblock.zip.aes
duplicati-20140130T180451Z.dlist.zip.aes
duplicati-i78d47cef3cdb4cd69a67a99814dce4b9.dindex.zip.aes

Original comment by pjala...@gigalock.com on 31 Jan 2014 at 1:06