Open mnaamani opened 11 months ago
Pick an s3 client package and test against multiple cloud providers Do we want to support more than one object store at a time?
@mnaamani there is a an NPM https://github.com/pkgcloud/pkgcloud#storage package that provides a unified interface to all/most of the obejct storage colud services, Maybe we can look into this and see if it meets the requirements
Adding links that might be useful for testing/development:
https://ytykhonchuk.medium.com/mock-amazon-s3-bucket-for-local-development-889440f9618e https://github.com/localstack/localstack https://dev.to/arifszn/minio-mock-s3-in-local-development-4ke6
I have rewritten your points @mnaamani, to make sure I understand what you are saying
The usage of Colossus storage is reaching levels that are challenging to manage with standard retail Bare Metal Storage Options, primarily due to the excessive storage capacity demands. The proposal suggests leveraging a cloud storage provider for hosting the joystream-storage
volume, enabling operators or the Lead to set a maximum storage capacity requirement on a Colossus server.
Below is a diagram illustrating the flow for a GET /storage-api/v1/assets/X
request:
graph LR;
Argus[Argus] --> Colossus[(Colossus)] --StorageAPI--> CloudStorage[(CloudStorage)];
GET
request directly to the Cloud Storage API to fetch the asset without local caching. Similarly, a POST
request to store an asset would also be forwarded.GET
request, Colossus checks for a locally cached version of the requested asset. If unavailable, it forwards the request to the Cloud Storage API. Fetched objects are cached locally for future requests. POST
requests result in local caching and simultaneous cloud storage. The caching policy involves removing "old" assets from local storage, offering a balanced and practical approach, though it requires a strategic caching strategy.Storage Bucket Concept: A storage bucket is a primary container for data, files, and objects in cloud storage services.
NEW
) for immediate asset storage.OLD
) to NEW
.OLD
if necessary until the migration to NEW
is complete.HEAD
request display info regarding just the local Colossus storage status or the whole Colossus storage + Cloud storage?/data/temp
is specifically designed for storing assets whose download has begun but not ended, to avoid having multiple download of the same assets at the same time, right?Quick thoughts:
HEAD
for all assets. Maybe for remote objects they could be resolved just by querying squid, without accessing the actual file./data/temp
- it's a store for pending uploads. Once the file is fully uploaded, it's moved to the pending
folder until it's accepted on-chain. I don't think it's a mechanism for stopping multiple uploads of same file.By that I mean a node that doesn't accept uploads directly, but only synchs objects from other storage providers and then stores them in S3.
ok so this means that during the synching process, instead of locally downloading the assets the Colossus node stores them into S3, right? And this process should cost to the operator less as possible in terms of AWS billing
I think the objects still need to be downloaded local first so their hash can be verified
AWS SDK is available also in typescript https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/#Usage_with_TypeScript So this means that we are first rolling out an initial version where we just synch to glacier storage, right? @kdembler
On Caching policy I say there shouldn't need to be any caching done in colossus.
That said for a current operating transitioning to s3, there may be a period where it might serve objects from its current local store if they have not been moved to s3 yet.
Background
With the explosive growth in demand on the storage infrastructure, and as suggested on multiple occasions by Storage lead, scaling the storage capacity of the storage node can be achieved by storing data objects on an object store built on top of clustering technology which allows dynamic growth of the capacity with limited disruption. eg. AWS S3, Ceph cluster + Object Storage, S3 compatible object stores from other cloud providers
Proposal - add support to store objects in object store.
Notes