cloudflare / utahfs

UtahFS is an encrypted storage system that provides a user-friendly FUSE drive backed by cloud storage.
BSD 3-Clause "New" or "Revised" License
815 stars 49 forks source link

How much needs to be downloaded to know what is what? And is it cached? #27

Closed balupton closed 4 years ago

balupton commented 4 years ago

I'm thinking of using UtahFS for a backup use case (storage of 2+TB of data, with intended retrieval of 1TB of data about once a year), using one of the following providers:

Which have much cheaper storage costs, but much more expensive access costs compared to the more typical tiers.

As such, how much would UtahFS need to download to know what is what per TB of data, and does it cache this "table of contents" knowledge so it doesn't have to be downloaded each boot?

One other consideration here is that these archival storage classes generally require a minimum lease on the data storage (so you upload a file, and you pay for a minimum of 6 months of storage of that file) — as such, UtahFS shouldn't change files unless absolutely necessary.

FlorentCoppint commented 4 years ago

Hi,

You can't use UtahFS with such services, it is not built to work on this. To explain simple, when using UtahFS, the objects on S3 service must be always available. When using Glacier-like backend service, objects are not available immediately, they have to be "restored", this operation takes time.

Le 29 juillet 2020 19:59:14 GMT+02:00, Benjamin Lupton notifications@github.com a écrit :

I'm thinking of using UtahFS for a backup use case.>

Plan would be to store about 2TB on data on either:>

However these cheap storage costs have much more expensive access costs compared to the more expensive but more typical storage tiers the providers offer.>

As such, how much would UtahFS need to download to know what is what for say 2TB of data, and does it cache this so it doesn't have to be downloaded each boot?>

-- > You are receiving this because you are subscribed to this thread.> Reply to this email directly or view it on GitHub:> https://github.com/cloudflare/utahfs/issues/27

Bren2010 commented 4 years ago

Florent is right, and I should also say that 2 TB of data is a relatively small amount. You'd be just as well using standard object storage providers: 2 TB on Backblaze = $10/mo

To answer the question you asked: Assuming you're not using ORAM, there's one "inode" object per file or folder which contains metadata about that file/folder. To access a file, you need to access the inodes for each directory on the way to the file, and the inode for the file itself. So the overhead is proportional to your directory structure.

balupton commented 4 years ago

Ok, that clears things up. Thanks for the information.

I'm pretty sure though that the AWS’s Glacier Deep Archive is available via the buckets API rather than usual glacier API, however it charges way more for such access.

Unfortunately, $10/month is too much for my career path. Will just go something way more basic then (writing directly to them without anything fancy).