Closed balupton closed 4 years ago
Hi,
You can't use UtahFS with such services, it is not built to work on this. To explain simple, when using UtahFS, the objects on S3 service must be always available. When using Glacier-like backend service, objects are not available immediately, they have to be "restored", this operation takes time.
Le 29 juillet 2020 19:59:14 GMT+02:00, Benjamin Lupton notifications@github.com a écrit :
I'm thinking of using UtahFS for a backup use case.>
Plan would be to store about 2TB on data on either:>
AWS’s Glacier Deep Archive which is about $0.99/TB/month for storage>
Google's Archive which is about $1.23/TB/month for storage>
However these cheap storage costs have much more expensive access costs compared to the more expensive but more typical storage tiers the providers offer.>
As such, how much would UtahFS need to download to know what is what for say 2TB of data, and does it cache this so it doesn't have to be downloaded each boot?>
-- > You are receiving this because you are subscribed to this thread.> Reply to this email directly or view it on GitHub:> https://github.com/cloudflare/utahfs/issues/27
Florent is right, and I should also say that 2 TB of data is a relatively small amount. You'd be just as well using standard object storage providers: 2 TB on Backblaze = $10/mo
To answer the question you asked: Assuming you're not using ORAM, there's one "inode" object per file or folder which contains metadata about that file/folder. To access a file, you need to access the inodes for each directory on the way to the file, and the inode for the file itself. So the overhead is proportional to your directory structure.
Ok, that clears things up. Thanks for the information.
I'm pretty sure though that the AWS’s Glacier Deep Archive is available via the buckets API rather than usual glacier API, however it charges way more for such access.
Unfortunately, $10/month is too much for my career path. Will just go something way more basic then (writing directly to them without anything fancy).
I'm thinking of using UtahFS for a backup use case (storage of 2+TB of data, with intended retrieval of 1TB of data about once a year), using one of the following providers:
AWS’s Glacier Deep Archive which is about $1/TB/month/storage 180 day minimum and $20.50/TB/access ($2.50/TB for bulk access, 48 hour wait time for retrieval) — calculator
Google's Archive which is about $1.23/TB/month/storage and $51.20/TB/access — calculator
Which have much cheaper storage costs, but much more expensive access costs compared to the more typical tiers.
As such, how much would UtahFS need to download to know what is what per TB of data, and does it cache this "table of contents" knowledge so it doesn't have to be downloaded each boot?
One other consideration here is that these archival storage classes generally require a minimum lease on the data storage (so you upload a file, and you pay for a minimum of 6 months of storage of that file) — as such, UtahFS shouldn't change files unless absolutely necessary.