Closed markwalkom closed 6 years ago
Almost needs a migration tool to move snapshots from s3 to glacier and back, where it tarballs and gzips (or some other method to create fewer or one file for glacier) then transfers to glacier. Then the reverse to "put it back" for restoration of the snapshot.
Would be nicer to be a full plugin or be supported in the aws plugin,etc.
Would also be nice if this could be very parallel.
I have been successfully archiving ES Snapshots to Glacier using an S3 Bucket Lifecycle policy for some time now. This Lifecycle policy archives the S3 objects in the "indices/" path only. This covers all the large snapshot data files. The much smaller snapshot metadata files, in the "root" of the Bucket, always remains in S3's Standard storage class, so they can be read immediately without causing errors for actions such as displaying snapshot data, for example the Curator "show snapshots" command.
On the odd occasion when I've needed to restore an index from a snapshot, I've used the AWS CLI tools (aws s3 & aws s3api) to restore the data file(s) from Glacier first, waited a few hours, and then use the ES API to restore the index(es) into the ES cluster. This process is a little long-winded but works.
One issue I have just come across is trying to delete snapshots that are no longer required. The Curator tool is using the DELETE snapshot call to the ES API, and this is throwing an error due to the storage class of the S3 object (presumably the Glaciered "indices/*" files).
It would be great to have the handling of Glaciered snapshots by Elasticsearch done transparently.
See also discussion in #13656
Also to note is the new Infrequent Access storage class for S3 that Amazon now has. This isn't as restrictive as Glacier, doesn't save you quite as much in cost but is still a very reduced cost compared to leaving it as is in S3. Which might make the glacier storage class slightly more special use case for only long term archival rather than pure cost saving now if much of your data may be ok to be set to Infrequent Access.
https://aws.amazon.com/s3/storage-classes/
Actually this ticket mentioned by dadoonet has had a pull request merged and might have solved it: https://github.com/elastic/elasticsearch-cloud-aws/pull/243
@geekpete The S3 plugin supports the storage_class
setting that can be set to STANDARD_IA
for the infrequent access storage.
We explicitly forbid the GLACIER
storage class, as writing a snapshot currently also potentially involves reading a lot of metadata files (low retrieval time, high costs). As such, I'm closing this issue. Please feel free to comment or +1 here if you want to add your point of view.
how do you manage to delete the old snapshots?
how do you manage to delete the old snapshots?
Please ask this question on https://ela.st/forum where we can give a better support. This space is only for confirmed issues or feature requests which have been discussed on https://ela.st/forum.
Thanks!
@geekpete The S3 plugin supports the
storage_class
setting that can be set toSTANDARD_IA
for the infrequent access storage. We explicitly forbid theGLACIER
storage class, as writing a snapshot currently also potentially involves reading a lot of metadata files (low retrieval time, high costs). As such, I'm closing this issue. Please feel free to comment or +1 here if you want to add your point of view.
How about using de S3 Glacier Instant Retrieval now? It has milisecond retrieval time with much lower costs per GB than Standard and Standard IA.
Apparently when data is moved from an S3 snapshot repository to Glacier we can no longer see it. This has come from https://discuss.elastic.co/t/snapshot-and-restore-s3-and-glacier/26337
Though this probably has an impact - To keep costs low, Amazon Glacier is optimized for infrequently accessed data where a retrieval time of several hours is suitable. https://aws.amazon.com/glacier/
Some more info from http://cloudacademy.com/blog/amazon-s3-vs-amazon-glacier-a-simple-backup-strategy-in-the-cloud/ S3 objects that have been moved to Glacier storage using S3 Lifecycle policies can only be accessed (or shall I say restored) using the S3 API endpoints. As such they are still managed as objects within S3buckets, instead of Archives within Vaults, which is the Glacier terminology.
Based on that it looks like the biggest issue is going to be the retrieval time, as ES would expect a reasonably quick response from the "FS".