Open GoogleCodeExporter opened 9 years ago
I'd definitely like to see this too. This may require architectural changes
though (due to the waiting period to retrieve data for example and potentially
with listing of existing files too), but I hope it'd be doable.
Original comment by ultraman...@gmail.com
on 23 Aug 2012 at 7:56
The other big challenge is optimal bandwidth throttling. Glacier charges by
the peak GB/hr usage, so the download should be throttled.
Original comment by scottjdu...@gmail.com
on 23 Aug 2012 at 8:27
Support for this would be great. Would make large volume backups very
inexpensive.
Original comment by mfsam...@gmail.com
on 26 Aug 2012 at 7:22
Wow, 32 stars in 5 days!
I had a look at the API, and it is not too hard to use, but the delay is a
problem.
The main problem for making the backups is the manifest files. They describe
the content and includes some checks so Duplicati can be reasonably certain
that the backup is intact. With each backup, Duplicati will download all the
manifests and use them to verify that the backup chain is intact and correct.
This cannot be done with Glacier, because you need to request the manifest(s)
and then wait 4 hours before you can download them, and you will be charged for
this.
This same problem applies to the signature files, but Duplicati has a cache for
this, so they are not requested during normal operation (the CLI needs to have
a cache folder passed as an option).
The only way I can see this working is by creating a hybrid backend, that uses
S3 and Glacier at the same time and presents a simplified view. That is, the
manifest files are store in an S3 bucket for fast access, everything else is
stored in Glacier. When Duplicati requests a list of files, the backend will
list the bucket with manifests and the Glacier vault and return a combined
list. When a download is requested, the backend needs to see if it is in S3 or
Glacier. Same goes for uploads, all .manifest files must be directed to S3,
everything else must be sent to Glacier. Similar for delete.
To tackle the 4 hour wait issue, something needs to change. For the CLI I
propose adding a new command called "prepare-restore" to Duplicati. To do a
restore from a Glacier based store, you can run "prepare-restore" first. This
will give you a ticket number you can later pass to "restore". If you simply
call "restore" without a ticket, it should create the ticket behind the scenes,
and wait until it can get the data. In this simple case you will not get a
ticket id, and thus you will need to re-issue the request if you restart the
restore.
The GUI just needs a status callback, so it can display a "waiting for data"
type icon until data is ready. It would probably call "prepare-restore" first,
to get the ticket, and then call restore with the ticket. IF you stop the
restore, it can resume on the same ticket later, thus not blocking any other
tasks that should run.
Original comment by kenneth@hexad.dk
on 27 Aug 2012 at 10:13
Maybe you can add cache support for manifest files too? So new backup will wait
4 hours only after (re)install of duplicati. I think in this case we can avoid
S3 usage. Thanx.
Original comment by mega...@gmail.com
on 27 Aug 2012 at 10:37
+1 for a hybrid backend. I'd additionally store the manifests in the glacier
vault in order to have a single reliable storage that even works in case the
manifests are gone in the S3 storage.
Original comment by tvanles...@gmail.com
on 27 Aug 2012 at 2:20
Would the hybrid have to be with S3? Wouldn't it be best if the manifests could
be stored on any other service (ie: just locally or in Dropbox or Google Drive)
too? Not sure if that's a lot more work to implement nicely, but it'd be a nice
feature to have surely.
Original comment by ultraman...@gmail.com
on 27 Aug 2012 at 2:25
I agree with some of the other comments, why not just store this locally and
only pull the file in the event the manifest is not present, such as in a
(re)install.
Generally though hybrid storage sounds good to me, and if that is technically
more feasible than most users should be able to handle some additional
configuration.
Original comment by mfsam...@gmail.com
on 27 Aug 2012 at 2:55
The vault should contain the manifests too. It is possible to restore without
the manifests, but having them in the vault makes it a lot more robust to do
this automatically.
In theory, the manifest files can be stored anywhere, including locally.
I prefer that they are stored somewhere remote so there are no weird issues if
two clients are accessing the same data.
The problem with an external location is the needed extra setup (url, username,
password, + special stuff) required to specify the other location, which is why
I proposed using S3, as the credentials/region/etc is already specified.
I had an idea for storing a value in the description field for each file, as
that can be retrieved without accessing the data. But I just discovered that
listing the vault contents (a vault inventory) is also a 4-hour operation.
Based on the comments here, I think storing the manifests locally seems like
the right thing to do, even if the local storage can become out-of-sync with
the remote one.
Original comment by kenneth@hexad.dk
on 28 Aug 2012 at 10:40
[deleted comment]
Why not allow the user to set a secondary store for the manifests?
That way the backups (along with manifests) can be stored on Glaicer, but the
user can specify keeping a copy on the manifests on localdisk, a NAS, s3, etc.
Original comment by brad...@google.com
on 31 Aug 2012 at 4:50
There is going to be some kind of AWS S3-to-Glacier API that might automate
backup movement to the cold storage after getting it (and checking everything)
at S3. Maybe wait until that happens to see the best approach.
Original comment by j...@jeffmcneill.com
on 4 Sep 2012 at 11:48
The way I'd like to see Glacier brought into this product would be only as long
term storage. My primary backup would be to something local and then once a
week or once a month, one full backup gets copied to Glacier. My manifests are
still all local along with most of my backups, but now I have a copy stored
remotely as well.
Original comment by thomg...@gmail.com
on 5 Sep 2012 at 1:23
I still believe it would be powerful to have two classes of storage: readily
available and not as readily available (glacier). Long term or short term is
less important than availability. Not as readily available is for things like
disaster recovery (e.g., the house or business burns to the ground, long term
off-site storage, etc.).
Original comment by j...@jeffmcneill.com
on 5 Sep 2012 at 1:28
I was curious if there was an updated on the feasibility of this?
If it cannot be added due to technical limitations I would like to investigate
alternatives for my organization. Thank you for all your time on this.
Original comment by mfsam...@gmail.com
on 4 Oct 2012 at 1:46
It is indeed feasible, someone just needs to do it :).
Sadly I have been more busy than usual, and when I get some time I prefer to
work on the new UI.
If someone is up for the task, drop me a mail and I will assist where I can.
IMO, you should always investigate alternatives, and it does not look like this
will be added in the near future.
Original comment by kenneth@hexad.dk
on 4 Oct 2012 at 2:31
I hope this will work in next update.
Original comment by kerane...@gmail.com
on 9 Oct 2012 at 10:01
That would be soooooo great!
Original comment by arjenvin...@gmail.com
on 11 Oct 2012 at 6:08
In the meantime I'm having play with http://fastglacier.com/
Original comment by r...@thegerrings.com
on 6 Nov 2012 at 5:54
Any change in plans for this feature due to the recent announcement that AWS
natively supports migrating data from S3 to Glacier?
http://aws.typepad.com/aws/2012/11/archive-s3-to-glacier.html
Original comment by mxx...@gmail.com
on 24 Dec 2012 at 9:51
We have discussed this and we still think that native support of Glacier is
much better than copying files from S3 to Glacier. The reason is that Duplicati
has some rules when to delete old backups and these rules don't work with an
S3-to-Glacier solution.
Original comment by rst...@gmail.com
on 29 Dec 2012 at 9:30
Yes, native support would be great, but based on Kenneth's previous comments it
sounded like an unlikely event, so I thought that maybe this S3-Glacier bridge
would lighten up things..
If those rules are based on age, then (as can be seen even from screenshots in
the linked article) AWS can automatically expire objects based on their age...?
Original comment by mxx...@gmail.com
on 29 Dec 2012 at 9:46
There are two rules: Age and number of backups. Age can be handled but it needs
to be manually handled in Duplicati and Glacier. Number of backups could be
handled if you turn "number of backups" into "age" somehow. However, both
approaches are error prone.
Kenneth would like to see Glacier support, too. But we have started the work on
the new UI already and we want to get this done before we start new things.
Original comment by rst...@gmail.com
on 29 Dec 2012 at 10:13
Hi,
Ever since I heard about Amazon Glacier I have been searching for a nice
backup-to-Glacier solution for my NAS. I just discovered Duplicati and think it
would be awesome to get it to work natively with Glacier but it sounds like
that is not going to happen anytime soon.
I found a workaround on a blog post on how to migrate Duplicati backups to from
S3 to Glacier while keeping the manifest files on S3:
http://blog.epsilontik.de/?page_id=68
His workaround, while functional, seems a bit complicated. I was thinking that
it would be easier just to rename the manifest files directly in the Duplicati
source code so that the Amazon prefix filter can be used. I was just going to
look into this myself, but before I waste a bunch of time I was wondering if
anyone with more insight can let me know if this was possible and if so point
me in the right direction.
Thanks!
Original comment by fabasi....@gmail.com
on 7 Mar 2013 at 10:31
Epsilontik is posting about duplicity not duplicati.
I'm not sure but I'd assume that the technical issues facing duplicity
are not the same as the ones facing duplicati.
On Mar 7, 2013, at 5:31 PM, "duplicati@googlecode.com"
<duplicati@googlecode.com> wrote:
Original comment by chris.dr...@gmail.com
on 7 Mar 2013 at 10:46
You would have the same problem with Duplicati, as the filenames have the same
prefix, and the manifest files must be read in each run.
You can rename the manifest files in the source, but it is slightly complicated
to get the parsing logic to eat it.
I am working on a new block-based storage format, that does not rely on
manifest files, and will thus work better with Glacier.
Sadly it does not look like Glacier support is magically appearing anytime soon.
Original comment by kenneth@hexad.dk
on 8 Mar 2013 at 10:54
This is NOT needed!!! You can use rules on your S3 bucket to move data from S3
to Glacier. If people what this, setup an S3 bucket and then setup a rule to
immediately move to Glacier.
Original comment by ma...@h5sw.com
on 9 Jul 2013 at 9:48
It IS needed for proper support. Yes you can move the backups into Glacier
easily enough however (if I understand everything correctly) you then would not
be able to manage the backups using duplicati (i.e. delete old backups being
the main issue)
Original comment by CraigaWi...@gmail.com
on 9 Jul 2013 at 9:54
From what I understand, your listing of files would be incomplete after moving
the files to Glacier, which would certainly break Duplicati.
Also, as I understand, you cannot simply delete the S3 entry when it is stored
in Glacier.
Original comment by kenneth@hexad.dk
on 9 Jul 2013 at 10:19
[deleted comment]
[deleted comment]
Sadly I only see mac support and not windows or anything else while we are
at it...
Original comment by coolsai...@gmail.com
on 9 Jul 2013 at 8:36
Thanks for suggesting an alternative to duplicati that works on 1/3 of the
platforms Duplicati works on. Can we keep this issue about Glacier support in
duplicati instead of suggesting non-replacements?
Original comment by Asa.Ay...@gmail.com
on 9 Jul 2013 at 8:37
That is correct, ARQ is mac only software. So yes, support for Glacier in
Duplicati is very much needed.
Original comment by t...@edgerunner.org
on 9 Jul 2013 at 8:40
[deleted comment]
I've been using Cloudberry now for a few months.
It is Windows and Glacier.
Original comment by ffcmer...@gmail.com
on 9 Jul 2013 at 9:00
Cloudberry is recommended on the ARQ homepage I think...
If you read the haystacks page carefully it gives solutions for different
OS's
P
On Tuesday, July 9, 2013, wrote:
Original comment by peter.da...@gmail.com
on 9 Jul 2013 at 9:13
I too gave up waiting for duplicati to support glacier and now use Cloudberry.
It works well but is not free or open source.
Original comment by ian.cumm...@gmail.com
on 9 Jul 2013 at 9:16
And it's not compatible with duplicati. Lets not Include non-open
source software in this discussion
Original comment by chris.dr...@gmail.com
on 9 Jul 2013 at 10:40
Also I'd prefer to see duplicati expand its capabilities to fully
support Glacier
Original comment by chris.dr...@gmail.com
on 9 Jul 2013 at 10:41
@39 just saying that I would much prefer duplicati to offer glacier support,
but if it won't, then people will move away and I am one example. Whether that
matters to anyone is a moot point.
Original comment by ian.cumm...@gmail.com
on 10 Jul 2013 at 5:26
I am working on a new file format that will make Glacier support much better.
It has a local database so there is no need to rely on file listings at all.
Initially it will only work with the S3+Glacier method (i.e. set a rule in S3).
I will not work on the actual Glacier implementation until after the 2.0 UI is
finished. But this is an open source project, so anyone is free to develop it
and I will gladly grant commit access.
If anyone wants to give it a go, there is a guide for the Duplicati part here:
https://code.google.com/p/duplicati/wiki/CustomBackendHowTo
Original comment by kenneth@hexad.dk
on 11 Jul 2013 at 12:07
It would be great to get the same kind of integration as git-annexhas for
access to S3 as well as/separate from Glacier. See for example the screencast
on Git Annex Assistant http://git-annex.branchable.com/assistant/
Original comment by j...@jeffmcneill.com
on 12 Jul 2013 at 6:47
Hi, this would be a really useful development.
One thing to think about while you're designing this is to allow support for
the AWS Import/Export service where you send Amazon a hard drive with your data
(the first backup). Otherwise, it might take months to upload.
Original comment by b.grych...@gmail.com
on 5 Sep 2013 at 9:08
I just published a short howto that explain how backups can be made to Glacier
using S3-to-Glacier and the new storage engine of Duplicati 2.0. I think with
the new storage engine we got the basics covered now for a Glacier connector.
http://www.duplicati.com/news/howtouseglaciertostorebackups
Original comment by rst...@gmail.com
on 26 Sep 2013 at 5:38
Thanks a lot! I will definitely take a look.
Have you given any thought to the AWS Import/Export ability I mentioned above?
In particular, would it be possible to set up Duplicati to use a local disk,
then send the disk to Amazon, who would put its contents on S3, and then
re-configure Duplicati to use S3 (and the files present there already)?
Original comment by b.grych...@gmail.com
on 6 Oct 2013 at 6:05
That should work. You just have to point Duplicati to a specific local database
file. The setting for it is --dbpath="". The reason is that Duplicati uses a
local database to speed things up and make Glacier support possible. It creates
this file automatically based on storage, username, path. If the storage
changes, the file changes, too. You can avoid that by specifying the right file.
Original comment by rst...@gmail.com
on 6 Oct 2013 at 7:15
Hi,
I'm testing Duplicati2 on Win7 with S3 Glacier. The backup works great, thanks!
But, looking at:
http://www.duplicati.com/news/howtouseglaciertostorebackups
it says to:
Set up a prefix-filter that moves files from S3 to Glacier regularly. The prefix-filter should look for all files that match “duplicati-d*”.
but when I look in my S3 store, I don't see any files that match that glob. I
see:
duplicati-<date>.dlist.zip.aes
duplicati-<rand>.dblock.zip.aes
duplicati-<rand>.dindex.zip.aes
It hasn't been long enough yet for the glacier to kick in, but I just wanted to
double-check that glob.
Thanks,
Pete
PS: Here's the procedure I used, for the record.
In AWS Management console, S3, set up a new prefix (subdirectory) in my AWS S3
bucket for this user on this computer.
At the root of the bucket, click Properties, Lifecyle, Add Rule, check Enabled,
Name: duplicati-glacier, uncheck Apply to Entire Bucket, Prefix: duplicati-d*
(this is the "prefix-filter" mentioned in the duplicati glacier webpage), Time
Period: Days from creation date, Action: Move to glacier: 1 : days from
object's creation date.
Save.
Navigate to the AWS IAM console, set up a new AWS IAM user for this bucket
prefix (for this user on this computer).
Download, save, and protect the IAM user AWS Access credentials.
In the IAM console, click on Users, the new user, Permissions, Attach user
policy, Custom Policy, Select, Policy Name: "pol<Computer-User>", Policy
Document (edit, paste in the following):
{
"Statement":[
{
"Sid":"AllowListBucketIfSpecificPrefixIsIncludedInRequest",
"Action":["s3:ListBucket"],
"Effect":"Allow",
"Resource":["arn:aws:s3:::<bucketid>-duplicati"],
"Condition":{
"StringLike":{"s3:prefix":["<computer-user>/*"]
}
}
},
{
"Sid":"AllowUserToReadWriteObjectDataInDevelopmentFolder",
"Action":["s3:GetObject", "s3:PutObject"],
"Effect":"Allow",
"Resource":["arn:aws:s3:::<bucketid>-duplicati/<computer-user>/*"]
},
{
"Sid": "ExplicitlyDenyAnyRequestsForAllOtherFoldersExceptDevelopment",
"Action": ["s3:ListBucket"],
"Effect": "Deny",
"Resource": ["arn:aws:s3:::<bucketid>-duplicati"],
"Condition":{ "StringNotLike": {"s3:prefix":["<computer-user>/*"] },
"Null" : {"s3:prefix":false }
}
}
]
}
Click Apply Policy.
Back at your computer.
Download duplicati2 from:
https://code.google.com/p/duplicati/downloads/detail?name=2.0_CLI_experimental_2013-09-13.zip&can=2&q=
Unzip that. Open a Windows command/cmd window, and cd to that new directory
with the executables and .dlls.
Note: in the duplicati command line, --option=value pairs must have an = sign,
and must not have any spaces around the = sign.
C:\Users\<user>\Downloads\2.0_CLI_experimental_2013-11-20\>
Duplicati.CommandLine.exe backup
s3://<bucketname>-duplicati/<computer-username> "C:\Users\<user>\Documents"
--s3-server-name=s3.amazonaws.com --s3-use-rrs=true
--s3-location-constraint=us-east-1 --use-ssl --aws_access_key_id=<kind of
secret I think> --aws_secret_access_key=<top secret key> --passphrase=<long
and strong>
Output:
Backup started at 1/30/2014 1:05:35 PM
Uploading filelist from previous interrupted backup
Checking remote backup ...
Uploading file (4.58 KB) ...
Listing remote folder ...
removing file listed as Temporary: duplicati-20140130T180450Z.dlist.zip.aes
removing file listed as Uploading:
duplicati-b1d64d11a070e4decb8d4034e5efe83ff.dblock.zip.aes
removing file listed as Uploading:
duplicati-i937b552485714fa48243f3c3eadfc18f.dindex.zip.aes
removing file listed as Uploading:
duplicati-bcef0970503db43ce958d76ec3e03b666.dblock.zip.aes
removing file listed as Uploading:
duplicati-i0564ebf376594de5aa83132ef0db46c5.dindex.zip.aes
Scanning local files ...
341 files need to be examined (737.51 MB)
336 files need to be examined (737.51 MB)
308 files need to be examined (659.37 MB)
256 files need to be examined (556.15 MB)
204 files need to be examined (452.51 MB)
152 files need to be examined (348.52 MB)
108 files need to be examined (247.75 MB)
99 files need to be examined (170.24 MB)
44 files need to be examined (75.96 MB)
0 files need to be examined (0 bytes)
Uploading file (18.63 MB) ...
Uploading file (240.01 KB) ...
Uploading file (10.47 KB) ...
Checking remote backup ...
Listing remote folder ...
Verifying remote backup ...
Downloading file (10.47 KB) ...
Downloading file (240.01 KB) ...
Downloading file (18.63 MB) ...
Remote backup verification completed
Duration of backup: 00:11:34
Remote files: 4
Remote size: 18.88 MB
Files added: 337
Files deleted: 0
Files changed: 0
Data uploaded: 18.88 MB
Data downloaded: 18.88 MB
Backup completed successfully!
Seemed to work fine. It deleted some files from my previous attempt.
Now run with glacier-specific duplicati options: --no-backend-verification
--no-auto-compact
Duplicati.CommandLine.exe backup
s3://<bucketname>-duplicati/<computer-username> "C:\Users\<user>\Documents"
--s3-server-name=s3.amazonaws.com --s3-use-rrs=true
--s3-location-constraint=us-east-1 --use-ssl --aws_access_key_id=<kind of
secret I think> --aws_secret_access_key=<top secret key> --passphrase=<long
and strong> --no-backend-verification --no-auto-compact
Backup started at 1/30/2014 2:11:19 PM
Scanning local files ...
342 files need to be examined (737.53 MB)
0 files need to be examined (0 bytes)
Duration of backup: 00:00:02
Files added: 0
Files deleted: 0
Files changed: 0
Data uploaded: 0 bytes
Data downloaded: 0 bytes
Backup completed successfully!
If I remember to, I'll update again after the files move to glacier.
Original comment by pjala...@gigalock.com
on 30 Jan 2014 at 7:16
If you look at the <rand> part of you filenames you can see that the dblock
files all start with "b" and the dindex files all start with "i", so the prefix
filter should work correctly.
Original comment by kenneth@hexad.dk
on 31 Jan 2014 at 12:59
Hmm, but here are 3 of my S3 objects from yesterday:
duplicati-b98b6b09770134370b226d39a83ce40dc.dblock.zip.aes
duplicati-20140130T180451Z.dlist.zip.aes
duplicati-i78d47cef3cdb4cd69a67a99814dce4b9.dindex.zip.aes
Original comment by pjala...@gigalock.com
on 31 Jan 2014 at 1:06
Original issue reported on code.google.com by
d...@simplycharlottemason.com
on 21 Aug 2012 at 1:29