colossus: add s3 compatible object storage backend

mnaamani commented 11 months ago

Background

With the explosive growth in demand on the storage infrastructure, and as suggested on multiple occasions by Storage lead, scaling the storage capacity of the storage node can be achieved by storing data objects on an object store built on top of clustering technology which allows dynamic growth of the capacity with limited disruption. eg. AWS S3, Ceph cluster + Object Storage, S3 compatible object stores from other cloud providers

Proposal - add support to store objects in object store.

Choose between api/interface to use object store
- Option A. Colossus will write into store using s3 HTTP API
- Option B. Consider mounting s3 bucket with fuse fs as a local volume with https://github.com/s3fs-fuse/s3fs-fuse. Does that work for non AWS s3 buckets?
When clients (distributor nodes and other storage nodes) request an object colossus will do an HTTP redirect to an http url to fetch the stored object from the object store directly. (If we use the fuse fs how can we map to such a url?) - Alternatively colossus would have to fetch object from the object storage (perhaps cache it in local fs as well) and then serve it directly. This is less complicated and requires very little to no changes but at cost of more bandwidth consumption by the operator.
HTTP HEAD requests should be redirected the same way. If this is not possible then colossus needs to keep local info about objects stored in object store to respond to HEAD requests?
Update argus and colossus to handle HTTP redirects when fetching object from remote storage-nodes.

Notes

We cannot store temp files in s3 when syncing or accepting an upload see: https://github.com/Joystream/joystream/issues/4977 (maybe only when mounting with fuse fs?)
Pick an s3 client package and test against multiple cloud providers.
Do we want to support more than one object store at a time?
If node can have multiple storage paths to help during migrating between volumes, we can it move objects to new storage path, new objects are added to new store and still be served from old store during migration.

zeeshanakram3 commented 11 months ago

Pick an s3 client package and test against multiple cloud providers Do we want to support more than one object store at a time?

@mnaamani there is a an NPM https://github.com/pkgcloud/pkgcloud#storage package that provides a unified interface to all/most of the obejct storage colud services, Maybe we can look into this and see if it meets the requirements

mnaamani commented 11 months ago

Adding links that might be useful for testing/development:

https://ytykhonchuk.medium.com/mock-amazon-s3-bucket-for-local-development-889440f9618e https://github.com/localstack/localstack https://dev.to/arifszn/minio-mock-s3-in-local-development-4ke6

ignazio-bovo commented 7 months ago

I have rewritten your points @mnaamani, to make sure I understand what you are saying

Rationale

The usage of Colossus storage is reaching levels that are challenging to manage with standard retail Bare Metal Storage Options, primarily due to the excessive storage capacity demands. The proposal suggests leveraging a cloud storage provider for hosting the joystream-storage volume, enabling operators or the Lead to set a maximum storage capacity requirement on a Colossus server.

Object Request Flow

Below is a diagram illustrating the flow for a GET /storage-api/v1/assets/X request:

graph LR;
    Argus[Argus] --> Colossus[(Colossus)] --StorageAPI--> CloudStorage[(CloudStorage)];

Decision Points

Caching Policy

No Caching Policy: Colossus forwards every GET request directly to the Cloud Storage API to fetch the asset without local caching. Similarly, a POST request to store an asset would also be forwarded.
With Caching Policy: Upon a GET request, Colossus checks for a locally cached version of the requested asset. If unavailable, it forwards the request to the Cloud Storage API. Fetched objects are cached locally for future requests. POST requests result in local caching and simultaneous cloud storage. The caching policy involves removing "old" assets from local storage, offering a balanced and practical approach, though it requires a strategic caching strategy.

Choice of Storage API

Storage Bucket Concept: A storage bucket is a primary container for data, files, and objects in cloud storage services.

Utilize a storage HTTP library that interfaces with the cloud storage using an API (potentially RESTful), ensuring compatibility with multiple cloud providers beyond just AWS S3.
Use s3fs-fuse for AWS S3 integration, which mounts an S3 bucket as a local volume, albeit limiting provider flexibility.

Bucket Access for Colossus Nodes

Single Bucket: A straightforward approach allowing operators to connect to one bucket at a time.
Multiple Buckets: Offers greater flexibility, allowing connections to several buckets simultaneously. This approach supports an efficient migration strategy (without affecting node uptime):
1. Establish a new bucket (NEW) for immediate asset storage.
2. Begin transferring assets from the old bucket (OLD) to NEW.
3. Continue serving assets from OLD if necessary until the migration to NEW is complete.

Open Questions

If we decide to go with a general Storage HTTP API, which one to use? We should prefer one that support multiple cloud providers and not simply S3 (which appears to be the most widespread solution). Possible suggestion: https://github.com/pkgcloud/pkgcloud#storage
Should HEAD request display info regarding just the local Colossus storage status or the whole Colossus storage + Cloud storage?
What is the difference between "Glacial Storage" and "Block Storage" since we have been using these two world and they appears to be related? Can you provide a short definition?
Also the /data/temp is specifically designed for storing assets whose download has begun but not ended, to avoid having multiple download of the same assets at the same time, right?

kdembler commented 7 months ago

Quick thoughts:

Like mentioned on Discord I think it's best to start small and only introduce cloud archive mode first. By that I mean a node that doesn't accept uploads directly, but only synchs objects from other storage providers and then stores them in S3. I think that is the most immediate need as it would allow us to safely reduce replication rate, possibly greatly reducing storage costs. Then we could iterate on the full version that can further reduce cost. Unless there's significant overhead of doing those separately.
For the library, I think we should be fine with official AWS S3 SDK. There are other providers that offer S3-compatible storage and the lib is surely well maintained and documented.
We need to handle HEAD for all assets. Maybe for remote objects they could be resolved just by querying squid, without accessing the actual file.
Something to keep in mind is minimizing amount of operations executed against S3 because each is paid.
S3 is block storage and glacier is a tier of S3 storage. It's designed as archival storage, for objects you need to access very infrequently. It has price rates adjusted for cheap long-term storage. https://aws.amazon.com/s3/storage-classes/glacier/ For that reason node using glacier may not want to be synched from at all, sort of in write-only mode.
Regarding /data/temp - it's a store for pending uploads. Once the file is fully uploaded, it's moved to the pending folder until it's accepted on-chain. I don't think it's a mechanism for stopping multiple uploads of same file.

ignazio-bovo commented 7 months ago

By that I mean a node that doesn't accept uploads directly, but only synchs objects from other storage providers and then stores them in S3.

ok so this means that during the synching process, instead of locally downloading the assets the Colossus node stores them into S3, right? And this process should cost to the operator less as possible in terms of AWS billing

kdembler commented 7 months ago

I think the objects still need to be downloaded local first so their hash can be verified

ignazio-bovo commented 7 months ago

AWS SDK is available also in typescript https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/#Usage_with_TypeScript So this means that we are first rolling out an initial version where we just synch to glacier storage, right? @kdembler

Questions

Would this also mean that are we supporting just s3 for the moment?
Can this feature be optional, let's say if I am an operator and I decide not to provide a s3 bucket then the synch just happens on the Orion local storage, right?

mnaamani commented 7 months ago

On Caching policy I say there shouldn't need to be any caching done in colossus.

For GET requests colossus should always return an HTTP redirect status code with url to s3/object store for requested object.
For POST requests, they must be handled by colossus (client should not be given write access to the s3 storage). Colossus will verify the file then upload it to s3 storage. Only colossus has write access to s3.
Similarly to POST requests, when colossus is fetching/syncing it will retrieve objects from other nodes, verify locally then upload to its own s3 store.

That said for a current operating transitioning to s3, there may be a period where it might serve objects from its current local store if they have not been moved to s3 yet.

Joystream / joystream